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FOREWOPD 


The  Eighteenth  Conference  on  the  Design  of  Experiments  in  Army  Research, 
Development  and  Testing  "as  held  25-27  October  1972  at  the  Aberdeen  Proving 
Ground,  Maryland.  The  U.  S.  Army  Test  and  Evaluation  Command  served  as  its 
host.  This  is  the  second  conference  in  this  series  to  be  held  at  the 
Proving  Ground.  The  first  one,  called  the  Sixth  Conference  on  the  Design  of 
Experiments,  was  held  in  October  1960  with  the  Ballistic  Research  Laboratories 
serving  as  the  host.  The  father  of  tnese  meetings,  Professor  S.  S.  Wilks, 
was  in  charge  of  arranging  the  program,  and  the  undersigned  served  as  the 
Chairman  on  Local  Arrangements.  Having  served  once  in  this  capacity,  one 
has  a  better  appreciation  of  the  work-load  faced  this  year  by  the  Local 
Chairman,  Mr.  Gerard  T.  Dobrindt.  Let  me  thank  Mr.  Dobrindt  for  his  excellent 
handling  of  the  many  problems  with  the  physical  arrangements  as  well  as  the 
problems  presented  by  the  attendees.  Thanks  are  also  due  to  Dr.  William 
McIntosh  for  his  guidance  and  assistance  in  many  phases  of  the  on-base 
arrangements . 

Professor  John  Tukey,  the  first  invited  speaker,  got  the  conference  off 
to  an  excellent  start  with  his  interesting  and  informative  treatment  of  the 
topic  "Exploratory  Data  Analysis".  He  was  followed  on  the  program  by  one  of 
his  colleagues  at  Princeton  University,  Progessor  G.  S.  Watson.  Dr.  Watson 
discussed  some  recent  developments  in  the  interesting  field  of  "Orientation 
Analysis".  At  the  Second  General  Session  members  of  the  audience  had  the 
pleasure  of  hearing  Professor  J.  S.  Hunter  discuss  one  of  his  papers  on 
"Sequential  Factorial  Estimation"  and  Professor  G.  E.  P.  Box  present  some 
of  his  work  on  "Forecasting  and  Control".  It  is  interesting  to  note  that 
both  Drs.  Box  and  Hunter  served  on  a  panel  discussion  entitled  "Common 
Pitfalls  in  the  Design  and  Analysis  of  Experiments"  at  the  Sixth  Design 
Conference.  The  fifth  invited  speaker  was  Professor  Raymond  H.  Myers  who 
enlightened  members  of  the  audience  on  some  recent  and  important 
developments  in  the  field  of  "Dual  Response  Surface  Analysis".  The  recipient 
of  this  years  Samuel  S.  Wilks  Memorial  Awards  was  one  of  the  above-mentioned 
invited  speakers,  namely  Dr.  G.  E.  P.  Box.  We  are  pleased  to  be  able  to 
include  in  these  proceedings  his  acceptance  remarks. 

The  Army  Mathematics  Steering  Committee  sponsors  these  conferences 
on  behalf  of  the  Chief  of  Research  and  Development.  Members  of  this 
committee  would  like  to  thank  the  many  Army  scientists  who  contributed  to 
the  success  of  this  meeting.  Without  their  dedicated  efforts  these  meetings 
would  not  be  repeated  year  after  year.  Scientists  in  other  government  agencies 
have  also  lent  their  talents  to  the  programs.  This  year  we  were  pleased  to 
have  three  contributed  papers  presented  by  members  of  the  National  Bureau  of 
Standards,  and  also  to  have  Dr.  Churchill  Eisenhart  of  the  Bureau  serve  as  a 
member  of  the  Program  Committee.  The  Food  and  Drug  Administration  was 
represented  on  the  agenda  by  Dr.  Clifford  Maloney.  His  services  as  a  member  of 
the  Program  Committee  were  also  appreciated.  One  or  two  Canadian  scientists 


usually  attend  and  contribu  to  the  discussion  at  these  meetings.  This  year 
we  were  pleased  to  have  one  our  Canadian  friends,  Mr.  G.  J.  McLaughlin, 
of  the  Defense  Research  Establishment  Valcartier,  present  one  of  the  clinical 
papers.  *  '•*  «'  _  ,  •  o  :  V.  «•  •  •  ‘  I 

In  addition  to  the  two  members  of  my  Program  Committee  already  mentioned, 
the  following  individuals  served:  Robert  Bechhofer,  Norman  Coleman,  Gerard 
Dobrindt,  Francis  Dressel,  Walter  Foster,  Boyd  Harshbarger,  William  McIntosh, 
Herbert  Soloman,  Grace  Wahba,  and  Geoffrey  Watson.  Those  gentlemen  and  one 
lady  were  charged  with  the  responsibility  of  outlining  the  general  character 
of  the  conference,  and  to  select  the  invited  speakers.  My  thanks  to  them 
for  preforming  this  task  in  a  fashion  that  again  led  to  a  successful 
scientific  conference.  It  seems  in  order  at  this  time  to  give  special  mention 
to  Dr.  Walter  D.  Foster,  is  the  Chairman  of  the  AMSC  Subcommittee  on 
Probability  and  Statistics.  In  this  capacity  Dr.  Foster  can  be  looked  upon 
as  the  one  generally  responsible  for  initiating  the  advanced  and  overall 
planning  for  each  conference.  He  serves  in  the  conduction  of  many  other 
phases  of  these  meetings;  in  particular,  he  serves  as  the  Chairman  of  the 
committee  that  organizes  the  final  form  of  the  agenda.  He  makes  the  report 
to  the  AMSC  on  some  of  the  accomplishments  of  each  Design  of  Experiment 
Conference.  On  behalf  of  all  attendees  at  these  meetings,  let  me  express 
our  thanks  to  Dr.  Foster  for  his  dedicated  efforts  to  these  scientific 
conferences.  ,  •' 

Finally,  we  desire  to  express  our  sincerest  appreciation  to  Dr.  Francis 
G.  Dressel  whose  many  significant  contributions  make  the  Army  Design  of 
Experiments  Conferences  a  success  from  year  to  year. 


Frank  E.  Grubbs 
Conference  Chairman 
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CALLING  OF  CONFERENCE  TO  ORDER 

Gerard  Dobrindt,  Chairman  for  Local  Arrangements, 

,US  Army  Test  and  Evaluation  Command 

WELCOME  TO  THE  U.  S.  ARMY  TEST  AND  EVALUATION  COMMAND 

Benjamin  S.  Goodwin,  Chief  Engineer,  U.  S.  Army  Test  and 
Evaluation  Command 

CHAIRMAN  OF  SESSION  I 

Dr.  Fred  Frishman,  Physical  Sciences  &  Engineering  Division, 
Office,  Chief  of  Research  &  Development,  Washington,  D.  C. 

EXPLORATORY  DATA  ANALYSIS 

Professor  John  Tukey,  Department  of  Mathematics,  Princeton 
University,  Princeton,  New  Jersey 

ORIENTATION  ANALYSIS 

Professor  G.  S.  Watson,  Princeton  University,  Department  of 
Statistics,  Fine  Hall,  Princeton,  New  Jersey 

LUNCH  -  Officers'  Open  Mess,  APG,  MD 

CLINICAL  SESSION  A  -  Library  Conference  Room,  Bldg.  330 
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A.  Clifford  Cohen,  Institute  of  Statistics,  University  of 
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.  '  '•  ‘  •  '  «  c,  ‘  "  WEDNESDAY  ' 

Bernard  Harris,  Mathematics  Research  Center,  The  University 
of  Wisconsin,  Madison,  Wisconsin 

Boyd  Harshbarger,  Department  of  Statistics,  Virginia 
Polytechnic  Institute  and  State  University,  Blacksburg, 
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University,  Washington,  D.  C. 
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Army  Electronic  Command,  White  Sands  Missile  Range, 

New  Mexico 

EXPERIMENTAL  DESIGNS  WITH  LARGE  NUMBERS  OF  VARIABLES 

Roger  L.  Brauer  and  Charles  C.  Lozar,  Architecture  Branch, 
Special  Projects  Division,  U.  S.  Army  Construction  Engineering 
Research  Laboratory,  Champaign,  Illinois 

1300-1445  TECHNICAL  SESSION  1  -  Conference  Room  B,  Building  314 

CHAIRMAN: 

Jerome  R.  Johnson,  US  Army  Materiel  Systems  Analysis  Agency, 
Aberdeen  Proving  Ground,  Maryland 

ORTHOGONAL  ESTIMATES  IN  WEIGHINC  DESIGNS  ,  * 

William  Gt  Lese,  Jr.,  US  Army  Materiel  Systems  Analysis  Agency, 
Aberdeen  Proving  Ground,  Maryland 

WEIGHING  DESIGNS  FOR  MASS  CALIBRATION 
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Malcolm  S.  Taylor,  US  Army  Aberdeen  Research  and  Development 
Center,  Aberdeen  Proving  Ground,  Maryland 
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Redstone  Arsenal,  Alabama 

THE  LEAST  SQUARES  ANALYSIS  OF  DATA  GENERATED  BY  A  "PIECE-WISE" 
GENERAL  LINEAR  MODEL 

Robert  L.  Launer,  Procurement  Research  Office,  US  Army 
Logistics  Management  Center,  Fort  Lee,  Virginia 

1445-1515  BREAK 

1515-1700  CLINICAL  SESSION  B  -  Library  Conference  Room,  Building  330 
CHAIRMAN: 

Paul  C.  Cox,  White  Sands  Missile  Range,  New  Mexico 
PANELISTS : 

A.  Clifford  Cohen,  Institute  of  Statistics,  University  of 
Georgia,  Athens,  Georgia 

Bernard  Harris,  Mathematics  Research  Center,  The  University 
of  Wisconsin,  Madison,  Wisconsin 

Boyd  Harshbarger,  Department  of  Statistics,  Virginia  Polytechnic 
Institute  and  State  University,  31acksburg,  Virginia 
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University,  Washington,  D.  C. 
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G.  J.  McLaughlin,  Defence  Research  Establishment  Valcartier, 
Courcelette,  P.  Q.,  Canada 
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CHAIRMAN: 

COL  L.  Ponder,  US  Army  Test  and  Evaluation  Command,  Aberdeen 
Proving  Ground,  Maryland 

EVALUATING  AND  SCHEDULING  PROTOTYPE  REQUIREMENTS  FOR  SUITA3ILITY 
TESTING 

Majors  Richard  B.  Cole  and  William  J.  Owen,  US  Army  Infantry 
Board,  Fort  Benning,  Georgia 

STOPPING  RULES  FOR  SCHEDULING  WITH  PARTICULAR  REFERENCE  TO 
MISSILE  RANGE  SCHEDULING 

Paul  H.  Randolph,  New  Mexico  State  University,  Representing- 
Instrumentation  Directorate,  White  Sands  Missile  Range, 

New  Mexico 

TECHNICAL  SESSION  4  -  Conference  Room  B,  Building  314 
CHAIRMAN: 

John  S.  Hagan,  Materiel  Testing  Directorate,  Aberdeen  Proving 
Ground,  Maryland 

A  ROBUST  CONFIDENCE  INTERVAL  FOR  LOCATION 

Alan  M.  Gross,  Princeton  University,  Department  of  Statistics, 
Fine  Hall,  Princeton,  New  Jersey 

APPROXIMATE  CONFIDENCE  LIMITS  FOR  P(X<Y) 

J.  R.  Moore  and  M.  S.  Taylor,  US  Army  Aberdeen  Research  and 
Development  Center,  Aberdeen  Proving  Ground,  Maryland 

STATISTICAL  EVALUATION  OF  FLIGHT  TEST  PERFORMANCE  OF  THE 
HELICOPTER  LIFT  MARGIN  SYSTEM  (HLMS) 

Erwin  Biser  and  Ronald  Kurowsky,  Avionics  Laboratory,  US 
Array  Electronics  Command,  Fort  Monmouth,  New  Jersey 

SOCIAL  HOUR  FOLLOWED  BY  THE  BANQUET .  PRESENTATION  OF  THE 
SAMUEL  S.  WILKS  MEMORIAL  AWARD 

Dr.  Frank  E.  Grubbs,  Chairman  of  the  Conference 


*  *  *  *  *  Thursday,  26  October  ***** 

TECHNICAL  SESSION  5  -  Library  Conference  Room,  Building  330 
CHAIRMAN: 

Royce  W.  Soanes,  Jr.,  Benet  R&E  Laboratory,  Watervliet 
Arsenal,  Watervliet,  New  York 
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AUTOMATED  RADAR  DATA  PROCESSING  AT  WHITE  SNADS  MISSILE  RANGE 
FEATURING  ADAPTIVE  FILTERING  WITH  BIAS  ESTIMATION 

W.  A.  McCool,  Analysis  and  Computation  Division,  White 
Sands  Missile  Range,  New  Mexico 

ALGORITHM  FOR  EDITING  BIVARIATE  DATA  FILES  WITH  RANDOM 
SPACING  IN  THE  INDEPENDENT  VARIABLE 

1LT  L.  Dave  Clements,  Data  Reduction  Section,  Yuma  Proving 
Ground,  Yuma,  Arizona 

TECHNICAL  SESSION  6  -  Conference  Room  B,  Building  314 
CHAIRMAN i 

SP/4  Ray  Peterson,  Frankford  Arsenal,  Philadelphia, 

Pennsylvania 

STATISTICAL  ANALYSIS  OF  H.  F.  OBLIQUE  AND  VERTICAL  INCIDENCE 
INOSPHERIC  DATA  APPLICABLE  TO  FIELD  ARMY  DISTANCES 

Richard  J.  D^Accardi,  US  Army  Electronics  Command,  Fort 
Monmouth,  New  Jersey 

Chris  P.  Tsokos,  University  of  South  Florida,  Tampa,  Florida 

COMPARISON  OF  THE  TRANSMISSION  THROUGH  FOG  OF  THE  3-5  AND 
8-12  MICRON  SPECTRAL  REGIONS  AS  A  FUNCTION  OF  THE  VISIBLE 
TRANSMISSION 

James  E.  Perry  and  Stuart  Laymar ,  Night  Vision  Laboratory 
USAECOM,  Forj:  Belvoir,  Virginia 

TECHNICAL  SESSION  7  -  Conference  Room  A,  Building  314 

i 

CHAIRMAN: 

Col.  George  T.  Morris,  Jr.,  US  Army  Test  and  Evaluation 
Command,  Aberdeen  Proving  Ground,  Maryland 

MAXIMUM  LIKELIHOOD  ESTIMATION  PROCEDURES  IN  RELIABILITY 
GROWTH 

Larry  H.  Crow,  US  Army  Materiel  Systems  Analysis  Agency, 

Aberdeen  Proving  Ground,  Maryland 

MODIFIED  PROPAGATION  OF  ERRORS  WITH  APPLICATIONS  TO  MAINTAINABILITY 
AND  AVAILABILITY 

Paul  C.  Cox,  White  Sands  Missile  Range,  White  Sands, 

New  Mexico  / 
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1130-1300 

1300-1415 


CHAIRMAN: 

George  L.  Kinnett,  HQ,  US  Army  Aviation  Materiel  Laboratories, 
.  >  Port  Eustis,  Virginia  ,  =  .  -c  '■=. 

WIND  TUNNEL  MODIFICATION  AND  EVALUATION 

E.  G.  Peterson,  C.  E.  Sperry  and  E.  Covert,  Deseret  Test 
Center,  Building  100,  Soldiers'  Circle,  Fort  Douglas,  Utah 

TECHNICAL  SESSION  9  -  Library  Conference  Room,  Building  330 

CHAIRMAN: 

E .ward  Flake,  Product  Assurance  Director,  Edgevood  Arsenal, 
Maryland 

TECHNIQUES  FOR  TAIL  LENGTH  ANALYSIS 

James  J.  Filliben,  Statistical  Engineering  Laboratory, 
Institute  for  Basic  Standards,  National  Bureau  of  Standards, 
Washington,  D.  C.  ,  - 

0  c 

CRITERIA  FOR  A  BIOCELLULAR  MODEL  -  BIOCELLULAR  COMMUNICATION 
George  I.  Lavin,  Vulnerability  Laboratory,  BRL,  ARDC, 
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EXPLORATORY  DATA  ANALYSIS  AS  PART  OF  A  LARGER  WHOLE* 
John  W.  Tukey 

Princeton  University,  Princeton,  New  Jersey 


(A)  Most  data  analysis  should  be  investigative 

It  is  not  enough  to c look  for  what  we  anticipate.  The  greatest  gains 
from  data  come  from  surprises.  We  will  usually  not  be  very  surprised, 
but  we  should  try  to  be. 

(AA)  Data  analysis  is  well  thought  of  in  three  phases 

As  we  come  to  think  over  the  process  of  analyzing  data,  when  done  well, 
we  can  hardly  fail  to  identify  the  unrealism  of  the  descriptions  given  or 
implied  in  our  texts  and  lectures.  The  description  I  am  about  to  give 
emphasizes  three  kinds  of  stages.  It  is  more  realistic  than  the  description 
we  are  accustomed  to  but  we  dare  not  think  it  (or  anything  else)  the  ultimate 
in  realism. 

The  first  stage  is  exploratory  data  analysis,  which  does  not  need 
probability,  significance,  or  confidence, and  which,  when  there  is  much  data, 
may  need  to  handle  only  either  a  portion  or  a  sample  of  what  is  available. 

That  there  is  still  much  to  be  said  and  that  there  are  new  simple  techniques 
to  be  developed  is  testified  to  by  3  volumes  of  a  book  now  in  a  limited 
preliminary  edition  (Tukey  1970-19/1)  which  deals  only  with  the  simpler 
questions,  leaving  multiple  regression  and  related  questions  for  later  treatment. 

The  second  stage  is  probabilistic.  Rough  confirmatory  data  analysis 
asks,  perhaps  quite  crudely;  "With  what  accuracy  are  the  appearances  already 
found  to  be  believed?"  Three  answers  are  reasonable; 

c  ‘  '  e  . .  "  ‘  ■  "  ‘  1  ‘  c  ■ 

-  The  appearances  are  so  poorly  defined  that  they  can  be  forgotten 
(at  least  as  evidence  though  probably  not  as  clues). 

-  The  appearances  are  marginal  (so  that  crude  analysis  may  not  suffice 
and  a  more  careful  analysis  is  called  for) . 

-  The  appearances  are  well-determined  (when  we  may,  but  more  often 
do  not,  have  grounds  for  a  more  careful  analysis). 

Among  the  key  issues  of  such  a  second  stage  are  the  issues  of  multiplicity; 
How  many  things  might  have  been  looked  at?  How  many  had  a  real  chance  to  be 
looked  at?  How  should  the  multiplicity  decided  upon,  in  answer  to  these 
questions,  affect  the  resulting  confidence  sets  and  significance  levels?  These 
are  important  questions;  their  answers  can  affect  what  we  think  the  data 
has  shown. 


*At  the  Eighteenth  Conference  on  the  Design  of  Experiments,  Professor  Tukey 
issued  this  outline  of  his  address.  It  contains  several  references  for 
those  interested  in  pursuing  this  topic. 
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IC  will  only  be  efter  we  have  become  used  to  dealing  with  the 
lesuee  of  multiplicity  that  we  will  be  psychologically  ready  to  deal 
effectively  with  correlated  estimates,  to  recognize  in  particular  (a) 
that  the  higher  the  correlation  the  less  the  chance  —  HOT  THE  GREATER  — 
of  one  or  more  accidental  significances  and  (b)  that  correlation  of 
fluctuations  need  imply  nothing  as  to  whether  the  real  effects  measured 
by  one  calculated  quantity  will  in  any  way  "leak"  into  other  calculated 
quantities.  Leakage  of  fluctuation  and  leakage  of  effect  HEED  HOT 
go  together,  though  they  sometimes  do. 

When  the  result  of  the  second  stage  is  marginal,  we  need  a 
third  stage,  in  which  we  wish  to  muster  whatever  strength  the  data 
before  us  possesses  that  bears  directly  on  the  question  at  issue  —  and  in 
which  we  often  also  want  to  borrow  strength  from  either  other  aspects  of 
the  same  body  of  data  or  from  other  bodies  of  data.  It  is  at  this  stage 
of  "mustering  and  borrowing  strength"  that  we  require  our  best  statistical 
techniques.  Medians  may  be  quite  good  enough  for  our  rough  confirmatory 
analysis,  but  if  we  have  good  robust  measures  of  location  they  are  needed 
in  mustering  and  borrowing  strength. 

To  argue,  as  we  have  implicitly  done  so  often  in  the  past,  that 
—  (1)  all  data  requires  mustering  and  borrowing  of  strength  and  (2)  this 
can  —  nay  should  —  be  done  without  any  exploratory  data  analysis  —  is 
surely  at"  least  one  of  the  minor  heights  of  unrealism.  Trying  to  make 
what  needs  to  be  data  investigation  into  data  processing  that  really  meets 
our  needs  involves  many  new  ideas,  and  ideas  come  slowly. 

(AAA)  Hovel  ad  hoc  analyses  need  not  bar  us  from  confirmatory  analysis 

To  be  clear  that  this  is  so,  we  must  be  prepared  to  face  up  to  two 

points: 

-  questions  of  multiplicity  are  not  going  to  be  avoided. 

-  approximate  confidence  and  significance  procedures  are  quite 
good  enough. 

Once  we  do,  the  jackknife+  will  give  us  adequate  confirmatory 
assessment;  our  only  struggle  will  be  with  assessing  degrees  of  multiplicity. 

(Waiting  for  specific  statistical  theory  for  specific  analyses  is 
unsound.  We  have  to  wait  too  long,  and  —  what  is  worse  —  we  get  theory 
based  on  too  narrow  assumptions.) 


+  For  an  introductory  account  see: 

Hosteller,  F.  and  Tukey,  J.W.  (1968).  Data  analysis,  including  statistics. 
Handbook  of  Social  Psychology ,  2nd  edition,  vol.  2.  C.  Lindzey  and  E.  Aronson, 
editors,  Addison -Wesley,  kaading,  Massachusetts,  80-203. 
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SOKE  PRINCIPLES 
THAT  SHOULD  GUIDE 
EXPLORATORY  DATA  ANALYSIS 
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(A)  Walk  first,  run  later 

"It  la  wall  to  understand  what  you  can  do  bafora  you  laarn  how  to 
aaaaura  how  wall  you  aacn  to  hava  dona  It". 

(AA)  Don't  wait  for  running  ahoaa.  a tart  now 
(AAA)  Data  analyala  ahould  ba  investigative 

"Exploratory  data  analyala  la  detective  work  —  numerical  detective 
work  —  or  counting  detective  work  —  or  graphical  detective  work". 

(AAAA)  Realatant  techniques  ahould  be  the  usual  beginning 

A  technique  la  resistant  If  changing  a  small  part  of  the  data  will 
have  only  small  effects  on  the  result,  no  matter  what  Is  done  to  the  small 
part.  (Means  are  not  resistant,  but  medians  are.) 

(AAAAA)  Analyses  should  come  b ifore  summaries 

Before  we  summarize,  we  should  analyze,  and  look  at  the  analysis. 
Here  an  analysis  is: 

a  conversion  —  usually  a  breakdown  —  of  numerical  data  into 

other  numbers  that  Is  both  reversible  and  relevant. 

Reversible  means  that  you  can  get  all  the  data  back  IN  DETAIL 
from  the  results  of  analysis.  Relevant  means  that  some,  at  least,  of  the 
results  of  analysis  illuminate  each  of  the  questions  most  likely  to  arise. 

We  will  come  to  examples  a  little  later. 

(AAAAAA)  In  routine  analysis  the  client  should  be  presented  with  at 
least  two  different  versions 


If  the  two  versions  "agree",  fine.  Let  some  summary  of  one  be 

published . 


If  the  two  versions  "disagree",  the  client  must  think  —  very 
painful,  but  forcing  this  may  be  the  best  thing  the  statistician  can  do. 


i 

i 


(AAAAAAA)  ’'Looking  at  the  data”  lnpllea  both  MORE  NUMBERS  and  BETTER 
PICTURES 

c 

The  unexpected  is  beet  brought  to  our  attention  by  pictures. 
Falling  this,  as  la  alwaya  to  acme  extent  neceaaary,  aore  numbers  can 
and  do  help. 

(AAAAAAAA)  Implicitly  defined  —  and  hence  Iteratively  calculated  — 
analyses  are  Inevitable 

We  have  been  frightened  too  long  by  some  mixture  of  the  apparent 
difficulties  of  hand  calculation  and  the  Inaptness  of  mathematical 
.  formulas .  Some  Implicitly  defined,  iterative  calculations  are  "as  easy" 
and  "at  least  as  safe  from  error"  as  those  that  use  arithmetic  means. 

( ********* )  Net  only  mathematical  statistics,  but  also  data  analysis. 
is  going  to  have  to  become  more  like  biochemistry. 

(added  March  1972) 

The  greatest  danger  of  an  applied  mathematical  science  is  the 
tacit  assumption  cf 


OMNIFERENCE 

of  the  assumption  that  both  users  and  bystanders,  from  knowing  exactly 
what  is  done,  will  be  able  to  draw  --and  will,  in  fact,  draw —  the  relevant 
inferences  concerning  the  behavior  of  every  technique  at  hand.  To  have 
direct  omniference  by  as  many  users  ss  possible  about  as  many  techniques 
as  possible  is  a  very  good  thing.  But  to  avoid  a  technique,  because 
omniference  is  hard  or  impossible,  can  be  very  unwise. 

We  ought  all  to  expect  that  users  are  to  be  told  the  best 
information  about  data  analysis  techniques  that  is  available  --whether 
or  not  they  can  afford  the  effort  to  understand  how  that  information  was 
gained--  and  whether  or  not  the  information  is  proved,  provable,  or  even 
subject  to  confidence  statements. 


EXPLORATORY  DATA  ANALYSES 

.  *  IN; RELATION  TO  •<  ■  »  --  «■  ■ 

PROBABILITY  MODELS 

(A)  Probability  models  are  to  give  results  for  guidance. 

Aa  atatlatlciana  we  must  taka  tha  aajor  ahara  of  reaponalbillty 
bara.  Wa  ought  to  aaka  tha  existence,  nature,  and  details  of  probability 
■ode la  openly  available  to  all  —  encouraging  their  perusal.  But  we 
ought  not  shove  them  down  throats  in  the  early  stages  of  learning. 

Mathematics  is  the  only  possible  scientific  discipline  in  which 
responsibility  can  be  completely  avoided  —  by  teaching  every  student 
all  the  proofs,  thus  making  him  responsible  for  the  validity  of  all  the 
mathematics  he  has  thus  learned.  No  one  else  can  avoid  responsibility 
this  way.  If,  aa  statisticians,  we  are  concerned  with  the  analysis  of  data, 
we  cannot  escape. 

Understanding  what  comes  of  carafully  formulated  probability 
situations  Is  of  the  essence.  Rarely  will  it  be  directly  and  precisely 
applicable  to  our  problems.  Only  as  we  carefully  broaden  the  bases  on 
which  it  is  built  will  we  bring  it  closer  and  closer  to  direct  application. 

(AA)  In  Exploratory  Data  Analysis.  SOX  efficiency  Is  plenty 

If  50Z  efficiency  will  not  reveal  an  effect,  95Z  will  nc r.  take 
It  significant. 

(AAA)  Simplicity  and  flexibility  outweigh  efficiency 

Recall  Churchill  Eisenhart's  definition  of  the  "practical  power" 
of  a  statistical  test: 


Tha  product  of  the  probability  that  the  test  will  be  applied  and 
the  mathematical  power  when  applied. 

(AAAA)  As  we  learn  from  broader  probability  models,  we  will  be 
better  guided  in  Exploratory  Data  Analysis 
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TWO-WAY  TABLES 
OP 

RESPONSES 


We  have  analyzed  nany  hundreds  of  thousands  (et  least)  of 
two-way  tables  of  responses  by  fitting  something  of  the  form 

+  hash 

Almost  all  of  this  has  been  done  by  explicit  arithmetic  means. 

These  are  really  used  as  algorithms  to  meet  Implicit  conditions 
that  certain  arithmetic  means  of  residuals  and  effects  are  zero. 

The  results  are  very  NON-resistant ,  —  and  hence  very  NON-robust 
of  efficiency.  We  can  no  longer  live  with  them  aa  the  only  approach. 

Implicit  medians  do  quite  well,  and  are  not  hard  to  apply. 

Aa  an  example,  let  us  look  at  data  from  page  103  of  the 
Rot hams ted  Field  Experiments  of  1969.  The  analysis  by  means  hardly  shows 
that  anything  is  going  on  among  the  residuals.  The  analysis  by  implicit 
medians  calls  at  least  two  things  to  our  attention. 


DATA,  wight  of  »uwr  beat  roots  In  0.01  ton 


HI 

R3 

R5 

R7 

(Mean a) 

DC 

1468 

1597 

1670 

1647 

(1596) 

ST 

980 

1237 

1379 

144) 

(1260) 

PT 

967 

1156 

1457 

1511 

(1273) 

CM 

1304 

1411 

1470 

1444 

(1415) 

PJ 

912 

1234 

1325 

1374 

(1211) 

PS 

963 

1234 

1351 

1367 

(1228) 

(Meant ) 

(1104) 

(1311) 

(1442) 

(1462) 

11330) 

AHALTSIS  1. 

by  explicit 

or  Implicit 

mean* 

DC 

94 

20 

-38 

-83 

266 

ST 

-52 

14 

3 

51 

-70 

PT 

-80 

-98 

72 

104 

-57 

CM 

145 

15 

-57 

-105 

85 

PJ 

-73 

42 

2 

29 

-119 

PS 

-39 

c  c 

25 

11 

2 

-102 

(eff) 

-226 

-19 

112 

132 

1330 

AHALTSIS 

2.  by  Implicit  median* 

DC 

176 

22 

-21 

-67 

289 

ST 

*6 

-32 

6 

41 

-17 

PT 

-53 

-147 

62 

73 

17 

GM 

230 

22 

-21 

-80 

103 

PJ 

-25 

6 

-5 

11 

-58 

PS 

6 

-6 

7 

-11 

-41 

(aff) 

-334 

-51 

51 

84 
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Rota:  Ml,  M3,  R5,  R7  ara  four  levela  of  added  nitrogen. 
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OTHER  KINDS  ' 


OF  ANALYSIS 
CONSIDERED  BRIEFLY 


(A)  When  regression  is  for  residuals.  «•  usually  in  the  analysis 
of  covariance,  for  example,  we  often  need  structural  regression,  rather  than 
predictive  regression. 

Stripping  out  an  effect  can  be  much  more  Important  than  minimizing 

residuals. 

(AA)  Almost  all  applications  of  spectrum  analysis  are  exploratory 
in  nature 


Where  spectrum  analysis  has  helped,  it  has  been  because  of  what 
it  has  shown  to  us. 


(AAA)  Numerical  classification  (numerical  taxonomy,  cluster  analysis, 
etc.)  has  been  an  unrecognized  battleground  between  explanatory  and 
confirmatory  data  analysis 

The  techniques  of  steadily  increasing  effectiveness  pushed  onward 
by  W.J.  Williams  and  G.N.  Lance  are  essentially  exploratory  in  nature. 

The  views  of  N.  Jardine  and  R.  Sibson,  to  pick  an  antithesis,  are  basically 
confirmatory.  (Rather  than  facing  the  multiplicity  problem  ((see,  for 
example,  Day,  N.E.  (1969)  Biometrika  56 ,  470-473))  the  most  usual  reaction 
has  been  one  of  fear  and  retreat  to  axioms  and  abstract  criteria.) 

(AAAA)  Almost  all  of  multivariate  analysis  has  suffered  from  an 
emphasis  on  confirmatory  data  analysis,  to  the  concealment  of  what  might 
have  been  seen 


(The  nearly  complete  book  of  R-  Gnandesikan  on  multivariate  data 
analysis  is  a  valuable  first  step  forward.)  (Canonical  analysis.  In  the 
sense  of  M.J.R.  Healy  et  al,  is  an  outstanding  example  of  improved  data 
Insight  by  exploratory  methods.) 

(AAAAA)  Contingency  tables  can  often  be  analyzed,  not  just  summarized 

It  is  their  analysis  that  offers  an  effective  foothold  for  their 
exploratory  data  analysis. 

(*x  x  x  **)  Effective  local  techniques  in  multivariate  analysis  seem 
likely  to  depend  on  near-voluiae  indicators. (added  March  1972) 

Cells  and  grids  are  useful  in  one-dimension  from  moderate  amounts  of 
data  up,  in  two-dimensions  from  moderately  large  amounts  up,  and  in  three 
dimensions  from  quite  large  amounts  of  data  up.  To  work  in  four  or  more 
dimensions  with  any  feasible  amounts  of  data,  or  to  work  in  three  or  two 
with  lesser  amounts  of  data,  we  have  to  do  something  else,  kth  nearest 
volume,  for  an  appropriate  shape  of  neighborhood,  seems  likely  to  fill  this 
gap.  . 
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EXAMPLES  * 
OF 

RETTER  PICTURES 


1) 

to 

4) 

'  S  tea- and- leaf  Ce  ■=•  « 

5) 

to 

7) 

Schematic  plots 

8) 

to 

9) 

Residuals  from  lines 

10) 

to 

13) 

Rov-PLUS- column  fits 

14) 

Bar  diagrams  may  need  bow  legs! 

15) 

to 

19) 

Rootograms  for  amounts  or  balances 

20) 

to 

22) 

Rootograms  for  counts 

23) 

to 

24) 

Rank-size-log  plots 

25) 

to 

9) 

Counting-in 

There  heeds  to  Is  i 
GOOD  PICURE 
c  In  response  Co 
EVERT 

type  of  question 
FREQUENTLY  ASKED 


*  Taken  or  adapted  from  John  W.  Tukey,  Exploratory  Data  Analysis. 
Limited  preliminary  edition  (3  voliaei).  Copyright  1970,  1971, 
Addiaon -Wesley  Publishing  Company. 


T" 


CLOSE 


Soee  would  call  —  out  loud,  or  in  their  minds  —  exploratory 
data  analysis  "Just  descriptive  statistics".  Those  who  take  this  view 
must  believe  that  "descriptive"  statistics  is  a  horrible  misnomer.  For 
I  hope  I  have  shovn  that  exploratory  data  analysis  is  actively  incisive 
rather  than  pasaltfcy  descriptive,  with  a  real  emphasis  on  the  discovery 
of  the  unexpected  —  if  necessary  by  figuratively  knocking  the  analyst’s 
head  against  the  vail  until  he  notices  it. 

Data  analysis  should  customarily,  if  not  routinely,  be 
Investigative.  Quantitative  detective  vo.'k  has  to  be  a  professional 
responsibility. 

Undoubtedly,  the  swing  to  exploratory  data  analysis  will  go 
somewhat  too  far.  However: 

It  is  better  to  ride  a  damped  pendulum 
than  to  be  stuck  in  the  mud. 


-in, 


THE  STATISTICS  OF  DIRECTIONS 


G.  S.  Watson 
Princeton  University 


INTRODUCTION.  An  analysis  of  directional  data  is  required  in  many 
fields  of  research.  The  writer's  work  was  stimulated  first  by  studies 
of  the  direction  of  permanent  magnetization  of  ancient  rocks  (palaeo 
magnetism  •—  see  Irving  (1964))  and  then  by  studies  of  bird  navigation. 
These  yield  examples  of  directions  in  three  and  two  dimensions  respec¬ 
tively.  Like  so  much  of  statistics,  the  developement  of  the  required 
theory  and  methods  owes  much  to  a  paper  by  Fisher  (19e3).  The  literature 
has  been  recently  summarized  in  a  book  my  Mardia  (1972). 

The  subject  not  only  has  practical  interest  —  it  has  theoretical 
interest.  The  tools  of  statistics  —  means,  medians,  variances,  distri¬ 
bution  functions,  etc.  —  are  all  fashioned  for  the  real  line  on  which 
the  observations  are  points.  When  they  are  points  on  a  circle  or  a 
sphere,  one  must  start  afresh  —  none  of  the  tools  just  mentioned  make 
sense  any  more.  Creating  a  new  set,  although  a  simple  job,  has  given 
the  writer  more  pleasure  than  any  of  his  other  statistical  work.  Other 
interest  stems  from  the  compact  nature  of  the  circle  and  the  sphere 
which  makes  things  simpler  than  the  line  —  but  this  aspect  is  not 
appropriate  for  further  discussion  today. 

2.  Data  and  its  summary  descriptions.  In  two  dimensions,  a 
direction  may  be  thought  of  as  an  angle,  a  point  on  a  circle  or  a  unit 
vector.  To  display  data  in  angles  one  could  show  it  on  a  straight 
line  of  length  360°  and  find  the  mean,  median,  variance,  etc.  But  this 
is  surely  wrong  since  it  supposes  that  observations  of  1"  and  359°  are 
far  apart.  If  all  the  data  is  concentrated  around  180",  no  great  harm 
is  done,  of  course.  If  it  is  shown  on  a  circle  no  such  problem  arises 
but  new  tools  are  required.  If  we  think  of  it  as  <■ 


— 1  *  *  *  *  *^N  * 


a  bunch  of  unit  vectors,  rhe  new  concepts  immediately  suggest  themselves. 


The  direction  of  the  mean  vector  — 

N 


N  t 

E  r.  is  suggested  as  a  center  of  the 
1 


sample.  7  *  Ir^  is  the  vector  resultant  of  the  sample  and  we  define  the 

sample  r  .n  direction  to  be  a  unit  vector  In  the  direction  of  R.  Thus 

Instead  of  having  e.g.  the  mean  of  directions  Is  and  359°  as  180°  ( ! ! ) , 
it  comes  out  sensibly  as  0°  -  360°. 
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The  graphical  display  of  directional  data  is  shown  in  Figure  1  a,b. 
Figure  l.c  shows  how  undirected  lines  are  plotted  —  these  are  called 
axes  and  need  a  different  treatment  than  directions.  Figure  l's  b  and  c 
together  indicate  the  axes  of  slump  folds  tend  to  be  parallel  with  the 
palaeo  current  direction. 

Given  a  sample  of  directions  with  a  single  center,  how  might  one 
describe  the  scatter  or  dispersion  of  a  sample  of  directions?  If  the 
bunch  of  vectors  is  very  tight  (i.e.  none  of  r^,...,r^  make  much  of  an 

angle  with  R)  the  dispersion  is  small  and  R,  the  length  of  jl,  is  almost 
as  large  as  N.  If  the  r^  point  in  many  directions,  the  dispersion  is 

large  and  R  will  be  small.  Hence 

Dispersion  of  sample  ■  N  -  R  (1) 

would  be  a  sensible  definition.  This  needs  to  be  reduced  by  a  factor 
like  N  to  get  it  on  a  per-observation  basis  so  we  might  define 

Scatter  -  -  1  -  jj  (2) 

-  -  2  2 

It  is  clear  that  we  have  found  analogues  of  x,  E(x^-x)  and  s  I 

If  we  turn  to  directions  in  three  dimensions  the  above  arguments 
and  definitions  still  make  sense.  To  visualize  such  data,  we  must  look 
at  points  on  the  surface  of  a  sphere.  To  show  them  on  a  two  dimensional 
page,  some  projection  must  be  used.  Different  projections  are  used  in 
different  subjects.  The  Lambert  projection  projects  a  hemisphere  so 
that  areas  are  preserved.  Thus  the  density  of  points  is  not  distorted. 
Hence  it  is  usually  best  for  statistics.  Special  paper  Is  available  for 
doing  this  manually.  The  point  with  spherical  polar  coordinates  (0,^), 

0  <  <p  <  ir/2,  is  made  to  correspond  to  a  point  (p,i{i)  using  planar  polars, 

where  <p  »  $  and  p  ■  /2C  sin  $/2.  Thus  the  upper  hemisphere  is  mapped  on 
a  disc  of  radius  C.  Figure  2  shows  plots  of  some  sets  of  geological 
data.  Efforts  are  often  made  to  "contour"  the  density  of  points  —  a 
generalization  of  histogramming.  The  paper  by  Watson  (1970)  gives  more 
details  and  references  and  relates  all  the  ways  different  subjects  have 
used  to  define  a  direction  in  three  dimensions. 

The  definitions  of  mean  direction,  dispersion  and  scatter  are  only 
useful  for  data  like  that  in  Figure  2. a.  It  will  be  noted  that  they  are 
related  to  the  center  of  mass  of  the  points,  each  of  unit  mass.  By  con¬ 
sidering  the  moment  of  inertia  (M.  I.)  of  the  set  of  points  we  can  sort 
out  other  configurations  e.g.  bipolar  distributions,  girdle  distributions 
Let  the  vector  r^  have  components  x^,y^  and  z^  so  that  +  y^2  +  = 

1.  Define 
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If 


N  . 

M-  l  St  Si 


Exiyi 

rVi 

rvi 

Eyi2 

Eyi*i 

zVi 

E  Vi  - 

Ezi2 

(3) 


The  matrix  M  la  symmetric  and  definite.  Its  eigenvalues,  >  *2  >  ^3 

say,  are  positive  and  add  to  n  ■  trace  M.  .  They  are  the  stationary  values 
of  d,'Md  where  d  is  a  unit  vector  and  the  eigenvectors  are  the  veccors  d 
yielding  these  values.  The  eigenvalues  and  eigenvectors  of  M  may  be 
interpreted.  Consider  the  M.  I.  of  the  unit  mass  at  point  r^  about  an 

axis  through  the  origin  parallel  to  a  unit  vector  d_. 


the  points  is 


M.I.  -  N  -  d  r.  r,  d-N-dMd. 
i-1  ~ 


Suppose  now  the  points  are  fairly  uniformly  distributed  around  a 
great  circle.  The  direction  of  greatest  M.  I.  is  perpendicular  to  the 
great  circle.  The  M.  I.  is  about  the  same  around  any  orthogonal  direc¬ 
tion.  Thus  this  distribution  corresponds  to  one  small  root  and  two 
nearly  equal  larger  roots.  One  large  and  two  small  corresponds  to  a 
uni-  or  bi-polar  distribution  —  and  these  can  be  distinguished  by  the 
length  of  R  (large  in  one  case  and  small  in  another).  If .  all  three  roots 
are  equal,  the  distribution  of  points  must  be  uniform  on  the  sphere. 

And  so  on. 

It  is  possible  to  write  a  simple  program  to  draw  Lambert  projections 
of  samples  and  to  calculate  R  ,  R  ,  M  and  its  eigenvectors  and  values. 
These  quantities  give  one  a  good  feel  for  the  data. 

3.  Parametric  distributions  and  tests.  If  the  observations  are 
symmetrically  arranged  about  a  single  center  with  a  density  falling  off 
as  one  moves  away  from  the  center,  one  will  not  go  far  wrong  in  assuming 


I 
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they  are  a  sample  drawn  a  distribution  with  density  proportional  to 
exp  k  cos  9  where  k  >  0  is  an  accuracy  parameter  and  9  is  the  angle 
between  the  center  and  the  observation. 

We  will  now,  because  of  time,  stick  to  three  dimensions.  There 
this  distribution  is  called  the  Fisher  distribution.  If  the  center  is 
the  unit  vector  £  and  £  the  observation,  the  density  is 


t 

k  r  u 


4irsinh  k 


With  data  r^,...,rN,  it  is  easily  seen  that  the  maximum  likelihood  (m.l.) 
estimates  of  k  and  are 

A 

-  the  direction  of  R  ■  R/R  (5) 

A 

and  k,  the  solution  of 

Coth  k  -  t  ■  5  .  (6) 

kk  N 


If  k  is  greater  than  3, 


Now  k  is  an  "accuracy"  parameter,  the  opposite  of  scatter,  so  this 
matches  our  intuitive  formula  (2).  This  fortifies  our  belief  that  (2), 
(5)  and  (7)  make  sense  even  if  (4)  is  not  quite  true. 

If  jj_  is  known,  the  m.l.  estimate  of  k  is 

N/(N-X),  X  -  r'u  (8) 

Thus  N-X  is  evidently  the  dispersion  of  the  sample  about  p,  just  as 

A 

N-R  is  the  dispersion  of  the  sample  about  p. 

When  k  «•  0,  (4)  is  the  uniform  distribution.  It  is  often  necessary 
to  test  whether  this  is  so.  For  single  cluster  alternatives,  we  will 
naturally  reject  if  R  is  too  large.  It  is  easily  shown  that 


so  the  test  is  easy  to  make.  (For  more  complex  alternatives,  other  tests 
are  appropriate.) 

To  test  whether  a  sample  comes  from  (4)  with  a  given  mean  or  polar 
direction,  one  may  consider  the  analysis  of  dispersion 


.L  >?■;. 


so  that  the  teat  la 

Rx  +  R2  -  R 

*2,2(N-2)  “  N  -  RL  -  R2  (N 
The  logic  of  (16)  la  aeen  from  the  triangle 


(17) 


It  haa  been  ahovn  that  these  tests  are  robust  against  quite  severe 
changes  in  the  parental  law  of  Fisher.  The  extreme  outliers  that  play 
havoc  on  the  line  cannot  occur  here. 
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mt  terns  of  preferred  orientation  (median  girdles  shown  by  broken 
lines),  (a)  Maximum  (symmetric):  1^0  lineations  from  Loch  [.even, 
Scottish  Highlands,  (b)  Girdle:  1,000  poles  of  foliation  from 
Turoka,  Kenya,  (c)  "crossed  girdle":  3y0  [0001]  of  quartz  from  / 

quartzite,  Barstow,  California,  (d)  Small  circle  or  "cleft"  girdle:  j 
1^10  [000l]  of  quartz  from  Orocopia  schist,  California.  (After  J.  M.  / 
Christie;  redrawn  from  Turner  and  Weiss  (1963)) 
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AN  INVESTIGATION  OF  WIND  FREQUENCY  RESPONSE  ON  THE  M50  ROCKET 


Bernard  F.  Engebos  and  Abel  J.  Blanco 
Atmospheric  Sciences  Laboratory,  US  Army  Electronics  Command 
White  Sands  Missile  Range,  New  Mexico 


ABSTRACT 

The  wind  frequency  response  for  the  M50  rocket  was  studied  using 
quadrant  elevation  angles  of  200,  400,  and  800  miles.  The  wind  profile 
was  assumed  to  be  of  the  form  of  finite  Fourier  series  with  statistically 
determined  amplitudes  obtained  through  a  random  number  generator. 

Impacts  were  simulated  using  a  5-degree-of-freedom  trajectory  model 
using  100  randomly  generated  wind  profiles.  Correlations  between  the 
randomly  generated  amplitudes  and  the  simulated  displacements  were  then 
computed.  So  far  the  results  are  inconclusive  and  improvement  is  necessary. 

INTRODUCTION 

The  main  reason  for  this  study  is  to  delineate  the  degree  of  fidelity  with 
which  the  wind  field  must  be  known  to  achieve  acceptable  rocketry 
accuracy.  Specifically,  how  high  must  a  space  frequency  of  wind  be  before 
the  flight  path  of  the  rocket  is  essentially  unaffected  by  that  frequency? 
An  exact  answer  to  this  question  would  involve  exhaustive  studies  into 
meteorological  data  collection,  data  analysis,  and  the  aerodynamics 
involved  in  treating  wavelengths  of  wind.  Several  preliminary  studies 
[1-5]  concerning  this  type  problem  have  been  reported.  The  objective  of 
the  study  is  to  find  optimal  wind  layers  so  that  relatively  accurate 
impacts  can  be  achieved.  To  accomplish  this  end,  a  5-degree-of-freedom 
ballistic  simulation  model  was  used  [6], 

The  Honest  John  M50  rocket  is  approximately  8  meters  long.  The 
mathematical  ballistic  model  (linear  aerodynamics)  used  to  calculate 
theoretical  trajectories  assumes  a  net  aerodynamic  force  acting  through 
the  center  of  pressure  of  the  rocket,  which  is  equivalent  to  assuming 
a  constant  angle  of  attack  over  the  surface  of  the  rocket,  i.e.,  the 
wind  is  invariant  along  the  rocket's  length.  Thus,  it  is  difficult  to 
speak  of  the  effect  of  wind  oscillations  when  the  wavelength  is  the  same 
order  as  the  length  of  the  rocket.  As  a  result,  the  highest  space 
frequency  considered  corresponds  to  a  wavelength  of  16  meters  (twice 
the  length  of  the  rocket).  For  all  trajectory  simulations  considered, 
all  atmospheric  and  aerodynamic  data  on  the  rocket  but  the  wind  were 
held  constant. 


The  remainder  of  this  article  was  reproduced  photographically  from  the 
authors'  manuscript. 
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DISCUSSION 


The  Input  wind  conditions  were  components  of  the  horizontal  wind 
as  functions  of  the  altitude  z. 

The  first  approach  consisted  of  letting 


wx,(z)  =51  (A, .cosw.z  +  B, .sinw.z) 

t  U  J  >J  J 

J-» 


wy,(z)  =  5  E  (C,  .cosw  .z  +  D. .sinu.z) 
T»  U  J  U  j 

j=l 


where  wx.  (z)  is  the  east -west  and  wy  (z)  the  north-south  component  of 
the  I-th'wind  profile  (m/sec),  to.  represents  mwind  space  frequencies 
with  wavelengths  in  intervals  of  Jio  meters  up  to  the  burnout  altitude 
of  the  rocket.  It  should  be  noted  that  m  is  equal  to  the  greatest 
integer  value  of  the  quotient  of  the  burnout  altitude  and  16. 

The  various  coefficients  of  "the  trigonometric  functions  were  generated 
by  a  random  number  generator,  assuming  a  normal  distribution  of  mean 
zero  and  standard  deviation  of  one.  The  multiplier  of  5  was  chosen 
to  ensure  representative  wind  magnitudes.  This  yields  a  finite 

Fourier  series  for  the  wind  components. 

•  >-  •  <.  '  : 
c  t  .  i. 

Associated  with  the  i-th  wind  profile  is  the  simulated  impact 
point  (x. ,  y.).  This  point  was  obtained  by  using  the  above  wind 
profile  cluriAg  the  power  on  and  power  off  portions  of  flight.  Let 
( xQ ,  y0 )  be  the  nowind  impact  point.  Setting 

r  '  ■  r 

'  e r  ^ 

Dx.  =  Xj  -  x0 

Dy f  =  y.  -  x0 

,  ,  1/2 

TD.  =  (Dx.2  +  Dy.2) 

i  i  i 
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PIJ  ■  C(AIJ2  *  BIJ*  *  c,j! *  V3 


one  than  can  compute  correlation  coefficient':  as  fol  lows: 


DXj  vs  A  j  j 


DXj  vs  B.j 


Dy.  vs  C.j  > J=l ,2, . . . ,m. 


°V|  vs  °IJ 


TD,  vs  P, 


Only  the  latter  correlation  coefficient  is  shown,  since  the  others  are 
similar.  Figure  I  shows  this  correlation  coefficient  versus  the  wave¬ 
length  of  the  space  frequency  for  a  quadrant  elevation  angle  of  200 
mils,  100  wind  profiles,  and  m  equal  to  12.  One  should  note  here 
that  this  correlation  coefficient  is  low  in  value.  This  may  be  caused 
by  too  few  cases  considered  and/or  by  the  Doppler  effect  on  the  space 
frequency  due  to  the  rocket’s  changing  velocity. 

Figures  2  and  3  show  similar  results  for  quadrant  elevation 
angles  of  400  and  800  mils,  respectively. 

Another  approach  involves  holding  the  amplitude  of  wind  constant 
and  varying  the  space  frequency,  ok;  I.e.,  set 

wx. (z)  =  5cosu.z 

J  J 

wy. (z)  =  5stnu.z 

J  J 

Figure  4  is  a  plot  of  total  displacement  versus  the  various  wavelengths 
of  the  space  frequency  for  200,  400,  and  800  mil  trajectory  simulations. 
As  the  wavelengths  increase,  the  total  displacement  of  the  simulated 
impact  point  from  the  nominal  impact  point  also  increases.  This  is 
quite  logical  since  the  low  frequency  wind  occurring  during  the 
powered  portion  of  the  rocket's  trajectory  (most  of  the  wind  weighting 
effect  [ \1\ ]  occurs  here)  appears  more  as  a  trending  wind.  In  the  "real 
world  situation,"  higher  frequencies  have  smaller  amplitude  and  thus 
can  be  neglected.  Generally  frequencies  with  wavelengths  less  than 


50  meters  long  can  be  Ignored  for  the  H50  rocket,  as  can  be  seen  by 
Figure  4. 

Combining  several  wind  space  frequencies  into  a  single  wind  profile 
seems  to  bear  out  the  fact  that  space  frequencies  are  relatively 
Independent  of  one  another  in  influencing  the  rocket's  flight  path. 

.  CONCLUDING  REMARKS 

Several  questions  still  are  unanswered: 

(1)  Will  the  statistically  derived  wind  profile  technique  described 
herein  be  successful  when  the  sampling  size  is  increased? 

(2)  What  about  wind  measurements  as  a  function  of  horizontal  range? 

(3)  At  what  altitudes  do  wind  space  frequencies  most  effect  the 
rocket  Impact  accuracy? 

(4)  Is  there  a  "best"  way  to  determine  optimal  wind  layering  to 
ensure  relatively  accurate  impacts? 

A  solution  to  the  above  questions  is  extremely  difficult.  Any  possible 
suggestions  on  how  to  solve  this  overall  problem  would  be  greatly  appreciated. 
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CORRELATION  COEFFICIENTS 


PROBLEMS  IN  DESIGNING  EXPERIMENTS  WITH 
LARGE  NUMBERS  OF  VARIABLES 

Roger  L.  Brauer  and  Charlea  C.  Lozar 
Special  Projects  Division  Architecture  Branch 
Construction  Engineering  Research  Laboratory 
Champaign,  Illinois 


The  performance  of  buildings  is  of  primary  concern  to  architects.  The 
designer  desires  to  produce  a  building  for  human  occupancy  that  not  only 
achieves  high  performance  of  materials  of  construction  but  also  achieves 
high  quality  for  the  user.  The  designer  wishes  to  achieve  a  high  degree  of 
user  satisfaction  and  support  for  user  performance. 

The  problem  we  wish  to  present  at  this  clinical  session  occurs  in  the 
evaluation  of  buildings  for  users  and  in  measurement  of  the  quality  of  the 
constructed  space.  In  completing  such  evaluations  and  designing  experiments 
which  will  measure  comparative  differences  between  buildings  there  is  a 
tremendous  number  of  variables  that  can  contribute  to  user  satisfaction  and 
performance  within  buildings.  We  are  faced  with  a  problem  in  measuring  and 
accounting  for  this  wide  range  of  factors  and  in  putting  them  all  together 
in  an  experimental  design.  The  results  of  such  experiments  are  intended  to 
help  establish  design  criteria  that  will  increase  the  satisfaction  of  users 
and  meet  their  needs,  as  well  as  satisfy  the  requirements  of  management. 

Briefly  the  evaluation  of  buildings  must  include  the  assessment  of: 

a.  Physical  conditions, 

b.  Functionality, 

c.  Attitudes  of  users  about  conditions, 

d.  Behavior  and  performance  of  users, 

e.  Cost. 

The  physical  conditions  include  space,  heat,  light,  sound,  color,  furnish¬ 
ings.  Functionality  Includes  such  things  as  traffic  flow,  productivity  of 
workers,  etc.  The  attitudes  of  building  users  can  be  influenced  not  only 
by  the  physical  conditions  but  also  by  organizational  climate,  personal 
factors,  and  demographics.  The  behavior  of  people  within  a  building  can 
be  social  or  nonsocial  and  their  performance  can  be  in  terms  of  sickness, 
absenteeism  and  productivity.  Finally,  cost  is  important  to  determine  the 
cost  effectiveness  of  designs. 

Consider  the  case  of  family  housing  as  an  example  of  the  large  num¬ 
ber  of  variables  that  can  influence  user  performance  and  satisfaction. 

Many  of  these  are  shown  in  Figure  1.  Everyone  has  some  idea  of  what 
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contributes  to  making  a  house  acceptable,  pleasant,  desirable,  and  satis¬ 
fying  to  Its  occupants  and  how  well  the  design  of  the  house  supports  the 
activities  that  occur  there. 

In  describing  the  physical  condition  of  the  house  it  is  clear 
that  the  various  rooms  In  the  house  Individually  and  collectively  contri¬ 
bute  to  the  quality  of  the  house.  Each  room  can  be  described  In  several 
ways.  The  number  of  bedrooms  would  be  important,  as  would  the  size  of 
the  bedrooms,  the  amount  of  storage  space,  the  lighting,  the  temperature, 
the  state  of  repairs,  room  color,  type  of  wall  and  woodwork  finish,  type 
of  flooring,  the  kind  of  windows,  the  arrangement  of  one  bedroom  relative 
to  the  others,  the  arrangement  of  bedrooms  relative  to  the  rest  of  the 
house,  the  distance  from  the  bedrooms  to  the  bathroom,  and  so  on.  Each 
of  the  other  rooms  in  the  house  could  be  described  similarly.  Each  con¬ 
dition  in  each  room  contributes  in  some  degree  to  the  overall  quality  of 
the  house. 


Beside  physical  conditions,  other  factors  can  also  have  an  In¬ 
fluence  on  the  quality  of  the  house.  It  could  be  the  overall  cost  of  the 
house.  If  a  house  Is  too  cheap.  It  may  not  be  durable;  If  It  Is  too  ex¬ 
pensive,  It  may  place  other  constraints  on  family  finances.  The  geographical 
location  may  be  Important  to  members  of  the  family.  Because  the  house  is 
located  In  the  wrong  part  of  the  country,  no  house  would  be  good  enough  to 
satisfy  the  user.  Geographical  location  might  Include  distances  to  shops 
and  stores,  convenience  to  schools  and  convenience  to  work.  Again  each  of 
these  factors  contributes  to  some  degree  to  the  quality  of  the  house  and 
to  the  satisfaction  of  its  users. 

Furthermore,  each  of  these  conditions  may  not  contribute  directly 
to  the  quality  of  the  space  or  user  satisfaction  but  are  usually  affected 
by  intervening  factors.  The  Intervening  factors  could  include  previous 
experience  with  other  houses,  differences  in  personal  composition-personal¬ 
ity  factors,  age,  level  of  Income,  social  status,  and  other  demographics. 
Attitudes  are  also  important.  There  can  be  attitudes  towards  the  physical 
conditions  themselves,  attitudes  about  one's  job,  general  attitudes  towards 
the  Army  and  attitudes  towards  neighbors,  the  neighborhood,  the  community, 
the  geographical  location.  In  addition,  behavior  can  have  an  effect  on  how 
the  physical  conditions  relate  to  quality  of  the  space  or  user  satisfaction. 
If  little  time  Is  spent  in  the  house,  the  occupant  may  not  be  as  critical 

about  conditions.  On  the  other  hand,  the  occupant  may  prefer  to  do  activ¬ 
ities  at  home  for  which  the  design  of  the  house  is  not  very  accommodating. 

The  design  of  the  house  may  directly  impact  the  health  and  safety  of  the 

occupant.  All  these  Intervening  factors  in  some  way  mediate  the  effect  of 
physical  and  other  conditions  on  the  quality  of  the  space  and  on  user  satis¬ 
faction  and  performance. 

In  designing  an  experiment  which  is  Intended  to  evaluate  the 
quality  of  the  house  and  how  satisfied  the  user  is  with  it,  all  these  factors 
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must  be  measured  and  the  effect  that  each  one  has  on  the  quality  of  the 
space  or  on  user  satisfaction  must  be  determined.  To  complicate  matters. 

It  Is  clear  that  Interactions  or  Interrelations  between  the  physical  con¬ 
ditions,  attitudes,  behaviors,  previous  experience,  and  personal  composi¬ 
tion  exist.  The  strength  of  these  Interrelationships,  as  well  as  their 
effects  must  also  be  determined. 

Other  buildings  could  similarly  be  described  and  evaluated  In 
terms  of  user  satisfaction  and  performance  as  a  measure  of  quality  of  de¬ 
sign.  Some  buildings  need  to  be  evaluated  on  a  more  micro-scale  basis. 

Here  the  problems  of  all  of  the  other  variables  mentioned  still  exist, 
with  one  additional  parameter  attached.  In  order  that  the  designer  be 
able  to  make  decisions  about  the  physical  character  of  the  space  he  pro¬ 
vides,  he  needs  the  psycho-social  data  to  be  location-specific.  He  re¬ 
quires  that  the  data  be  connected  to  an  Identifiable  physical  location 
In  a  room,  or  a  specific  room  In  a  building.  Much  of  the  psychological 
literature  to  date  Ignores  this  need  and  therefore  is  regarded  as  "unusable 
theory"  by  many  designers.  We  find  a  need  to  quantify  the  behavioral  suc¬ 
cess  of  a  building  and  relate  these  data  to  specific  locations  In  the  en¬ 
vironment.  Therefore  the  problem  under  examination  in  this  part  of  the 
discussion  Is  the  quantification  of  the  degree  of  "fit"  between  man  and 
his  environmental  setting  and  the  identification  of  behavioral  "units"  of 
deslgnable  environment. 

A  basic  "chunk"  or  unit  of  man-environment  interaction  called  a 
behavior  setting  has  been  Identified  In  the  psychological  literature  (Barker, 
1968)  and  applied  to  such  contexts  as  housing  (Bechtel,  1970)  and  hospitals 
(LeCompte,  1972).  This  unit  shows  great  promise  for  analyzation  of  micro¬ 
scale  architectural  environments.  Barker  states  that  this  "chunk,"  the 
behavior  setting.  Is  characterized  by  a  standing  pattern  of  clearly  Iden¬ 
tifiable  behaviors  regardless  of  participants.  Examples  might  be  classes 
of  behaviors  In  restaurants,  libraries,  and  supermarkets.  In  each  of 
these  contexts,  patterns  of  behavior  are  similar  for  participants,  and  are 
Independent  of  Individuals.  The  behavior  setting  analysis  technology  de¬ 
veloped  by  Barker  provides  a  system  for  Identifying  chunks  of  location 
specific  behavior  and  notating  activities  and  attitudes  to  these.  Just  as 
a  language  has  a  vocabulary,  syntax,  and  rules  of  grammar,  the  behavior  set¬ 
ting  has  units,  qualities,  and  degrees  of  Independence  and  Interdependence. 
Now,  the  problem  of  using  these  "chunks"  of  man-environment  Interaction  be¬ 
comes  more  difficult  when  structured  into  an  experimental  situation  to  pro¬ 
vide  Information  usable  for  the  designer. 

An  example  of  a  military  dining  hall  might  serve  to  develop  the 
concept  of  behavior  and  attitude  relationships  related  to  specific  units 
of  environment.  In  the  normal  dining  process  of  food  acquisition,  respon¬ 
dents  experience  physical  and  social  environment  In  a  linear  sequence 
(Thiel,  1961).  Each  of  the  activities  subjects  engage  in  can  be  differ¬ 
entiated  by  the  nature  of  the  behavior  mechanisms  employed,  i.e.,  gross 
motor,  manipulative,  motion,  etc.  On  the  basis  of  this  differentiation. 
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we  can  Identify  some  ten,  distinctly  different  key  behavior  settings  such 
%  as  sign-in  desk,  silverware  pick-up,  path  to  table,  etc.  Now  the  behavior 
In  the  physical  space  of  one  setting  will  affect  the  attitudes  of  respon¬ 
dents  at  another  setting  elsewhere  In  the  dining  space.  We  know  that  re¬ 
spondents  will  rate  the  concept  of  "privacy"  lower  at  the  dining  table  as 
the  number  of  persons  standing  In  line  at  the  sign-in  desk  Increases  (Gibbs, 
1972).  The  behavioral  data  in  this  case  is  location-specific.  The  designer 
can  make  decisions  of  a  physical  nature  to  change  the  attitude  rating.  He 
can  shield  the  check-in  desk  from  view,  provide  two  desks  for  faster  pro¬ 
cessing,  or  more  the  check-in  desk  elsewhere,  all  with  the  Intent  of  In¬ 
creasing  the  sense  of  privacy  at  the  dining  table.  In  this  example,  be¬ 
havioral  data  (number  of  persons  In  line)  and  attitude  (rating  of  privacy) 
have  been  made  location-specific  (sign-in  desk,  table)  and  the  designer 
can  make  decisions  based  upon  this  Information.  However  it  is  not  quite 
that  simple. 

From  our  previous  discussion,  we  realize  the  great  number  of 
variables  interacting  in  any  social  setting.  Obviously  privacy  and  popu¬ 
lation  at  sign-in  desk  are  not  Independent  of  other  factors  in  the  environ¬ 
ment.  Not  only  would  the  dining  "privacy"  experience  be  affected  by  popu¬ 
lation  movement,  but  also  by  physical  conditions,  noise  level,  and  the 
management  climate.  To  some  degree,  all  of  these  affect  the  rating  of 
privacy.  It  Is  the  combination  of  factors  that  cumulatively  make  up  the 
concept  of  privacy  and  the  problem  is  again  one  of  a  large  number  of  vari¬ 
ables.  It  Is,  of  course,  possible  that  the  designer  could  not  solve  them 
all,  but  certainly  a  knowledge  of  what  amount  of  the  total  variance  could 
be  accounted  for  by  changes  in  physical  design  would  be  useful  in  develop¬ 
ing  a  measure  of  cost  effectiveness  for  changes  in  the  environment. 


We  have  presented  two  kinds  of  problems  in  environmental  analysis 
each  involving  a  great  number  of  variables.  The  first  Involved  the  range 
of  parameters  and  variables  existent  in  any  environmental  setting,  and  the 
second  problem  addressed  the  need  for  identifiable  "chunks"  of  behavior- 
environment  interaction  which  the  designer  might  address  to  begin  his  ar¬ 
chitectural  translation  process,  with  the  intent  of  improving  the  performance 
of  the  building.  From  the  standpoint  of  the  designer,  he  realized  that  be¬ 
havior  and  environment  interaction  is  not  a  one-valued  concept,  but  rather 
multi-dimensional  and  interactive.  The  behavioral  scientist  would  certainly 
believe  that  Interactions  between  the  many  variables  suggested  in  this  paper 
are  probably  more  complicated  than  simple  correlation  coefficients  would 
describe,  yet  he  would  not  suggest  a  priori  a  series  of  interrelated  factors. 
The  question  is  then,  what  kinds  of  research  designs  and  analytic  techniques 
will  lend  themselves  to  discovering  major  components  of  human  interaction  with 
environment.  Multiple  linear  regression  (Brauer,  1972),  factor  analysis 
(Canter,  1972),  and  cluster  analysis  (Bechtel,  1972)  have  been  suggested. 
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Since  It  Is  seldom  possible  to  experimentally  control  variables 
In  building  designs  and  comparative  demonstration  projects  are  far  too 
expensive,  suggestions  for  handling  a  large  number  of  variables  In  a  semi- 
controlled,  real  world  situation  are  needed.  Are  there  other  techniques 
more  responsive  than  tnose  suggested  to  discovering  relations  and  major 
factors  In  large  data  matrices?  Can  they  also  relate  disparate  data  from 
attitude  and  behavioral  Investigations  to  location-specific  chunks  of 
environment?  Will  these  techniques  be  more  or  less  responsive  than 
present  ones  to  Investigations  In  the  real  world  context,  and  what  sort 
of  experimental  controls  are  necessary?  These  are  the  questions  we  ask  in 
order  to  build  a  firm  scientific  basis  for  the  design  of  buildings  that 
are  compatible  with  human  behavior  and  needs. 

REFERENCES 

Barker,  Roger,  Ecological  Psychology,  Stanford  University  Press, 

Stanford,  California,  1968. 

Bechtel,  Robert,  Arrowhead  Report,  Environmental  Research  and  Development 
Foundation,  Kansas  City,  Missouri,  July  1971. 

Bechtel,  Robert,  Conference  with  the  author,  December  4,  1972  at  CERL. 

Brauer,  Roger,  Survey  o&  Soldiers  Attitude*  to  Troop  Housing,  U.S.  Army 
Construction  Engineering  Research  Laboratory,  Champaign,  Illinois, 

January  1973  (In  press).  .  .  „ 

Canter,  David,  "Royal  Hospital  for  Sick  Children,"  The  Architect's 
Journal,  September  6,  1972,  pp.  525-564. 

Gibbs,  Wesley,  Vining  Tacility  User  -  Attitude  Survey  and  Vecor  Experiment 
at  Travis  AFB,  U.S.  Army  Construction  Engineering  Research  Laboratory, 
Champaign,  Illinois,  January  1973. 

LeCompte,  "Behavior  Settings:  The  Structure  of  the  Treatment  Environ¬ 
ment,"  EDRA  lll/AR  8  Proceedings,  UCLA,  Los  Angeles,  California, 

January  1972,  p.  4.2. 


-32- 


ORTHOGONAL  ESTIMATES  IN  WEIGHING  DESIGNS 
William  G.  Lese,  Jr. 

US  Army  Materiel  Systems  Analysis  Agency,  APG,  Maryland  c 

and 

K.  S.  Banerjee 

University  of  Delaware,  Newark,  Delaware 

ABSTRACT 

A  new  technique  has  been  developed  for  modifying  all  balanced  incomplete 
block  designs  (BIBD)  to  provide  orthogonal  estimates  when  the  modified  BIBD 
are  to  be  used  as  a  weighing  design.  Previously,  K.  S.  Banerjee  developed  a 
method  for  modifying  BIBD  to  provide  orthogonal  estimates.  However,  for  a 
certain  class  of  BIBD,  Banerjee's  method  failed  to  provide  orthogonal  estimates. 
A  comparison  of  the  relative  efficiencies  of  the  new  procedure  with  that  of 
Banerjee's  procedure  is  also  presented.  In  addition,  under  the  new  procedure 
It  Is  shown  that  the  covariance  matrix  of  the  estimators  obtained  by  the  least 
squares  procedure  is  identical  to  that  obtained  by  the  maximum  liklihood 
procedure,  even  when  the  design  matrix  X  is  not  square.  Several  examples  of 
the  utilization  of  the  new  technique,  along  with  a  historical  development  of 
the  weighing  problem  from  its  origin  in  a  casual  example  by  Yates  through  the 
work  of  Hotelling,  Mood,  Kempthome,  and  Banerjee  as  relative  to  the  problem 
of  providing  orthogonal  estimates,  is  also  presented. 
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CHAPTER  I 


ORIGIN  OF  THE  WEIGHING  PROBLEM 

In  an  article,  "Complex  Experiments",  Yates  [10]  considered  the 
following  problem:  A  chemist  Is  given  the  task  of  determining  the 
weights  of  seven  light  objects,  and  the  scale  the  chemist  must  use 
requires  a  zero  correction.  The  customary  technique  would  be  to 
weigh  each  of  the  seven  objects  individually  and  then  make  an  eighth 
weighing  with  no  objects  on  the  scale.  This  eighth  weighing  would  be 
used  to  determine  the  zero  correction  factor. 

Mathematically,  the  customary  weighing  technique  for  the 
chemist  problem  would  be  as  follows:  The  seven  objects  will  be 
denoted  as  a,  b,  c,  d,  e,  f,  and  g.  The  scale  bias  will  be  denoted 
by  z. 

Weighing  Number  Object  Weighed  Scale  Reading 


1 

a  +  z 

Y1 

2 

b  +  z 

y2 

3 

c  +  z 

Y3 

4 

d  +  z 

Y4 

5 

e  +  z 

Y5 

6 

f  +  z 

Y6 

7 

g  +  z 

Y7 

8 

z 

Y8 
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Using  this  notation  the  weight  of  any  object  can  be  determined  by 
taking  the  difference  between  the  scale  readings  when  carrying  the 
object  and  the  scale  reading  when  no  object  is  on  the  scale.  For  this 
example,  the  weights  of  the  seven  objects  would  be  determined  as 
follows: 

«  -  V  Y8 
b  *  Y2  "  Y8 

c  *  Y3  -  Y8 

daVY8 

e  *  Y5  *  Y8 

f  -  Y6  ‘  YS 

9  "  Y7  *  Y8 

Assuming  that  systematic  errors  are  non-existent  and  that  the 
errors  are  random,  the  variance  of  each  weighing  may  be  denoted  by 
o2  and  the  standard  error  by  0.  With  these  assumptions,  the  variance 
of  the  estimated  weights  using  the  customary  weighing  technique  is  2o2 
and  the  standard  error  is  oS2  . 

For  an  improvement  over  the  customary  weighing  technique, 

Yates  suggested  that  the  objects  should  be  weighed  in  combinations 
with  each  other  instead  of  being  weighed  individually.  For  example. 
Figure  1  presents  Yates’ technique  for  the  determination  of  the  weights 
of  seven  objects. 


We  notice  that  In  Yates  method,  each  object  Is  weighed  four 
times  In  combination  with  the  other  objects.  In  the  four  weighings  of 
a  given  object,  every  other  object  is  included  twice.  In  the  remain¬ 
ing  four  weighings,  i.e.  the  weighing  without  the  object,  every  other 
object  Is  also  included  twice.  Therefore,  the  weight  of  any  object 
can  be  determined  by  adding  the  scale  readings  containing  the  object, 
subtracting  the  scale  readings  not  containing  the  object  and  dividing 
this  result  by  4.  Using  this  procedure,  the  weights  of  the  seven 
objects  would  be  determined  as  follows: 

3  _  V  Y2  ♦  W  V  YS  -  V  *8 

4 

b  ■  Y1  ♦  y2  +  vs  +  Y6  -  Y3  -  y4  -  r7  -  r8 

4 

c_  WWV  WY8 

'  '  4  •  .  *  ■; 

•:  ‘  t 

J.VVVVVVV^ 

4  ■„  " 

e  =  VV  VV  Y2  ~  Y3  "  r5  ~  y7 

4 
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Weighing  Humber  Objects  Weighed  Scale  Reading 

1  a  +  b  +  c  +  d  +  e  +  f+  g  Y^ 

2  a  +  b  +  d  ^2 

3  a  +  c  +  e  Y3 

4  a  +  f  +  g  Y4 

5  b  +  c  +  f  Y5 

6  b  +  e  +  g  Y6 

7  c  +  d  +  g-  Y  7 

8  d  +  e  +  f  Y8 


Figure  1.  Yates  Weighing  Design 


W  W  Y2  ‘  W  Y7 


and 


V+Y4  +  Y+  Y7  ‘  Y2  “  Y3  -  Y5  “  Y8 


It  should  be  noted  that  the  bias  (i.e.  the  zero  scale  correc¬ 
tion)  cancels  out  in  the  above  expressions.  Since  the  variance  of  a 
sum  of  independent  observations  is  the  equal  to  the  sum  of  the  vari¬ 
ances,  the  variance  of  an  estimated  weight  determined  by  Yates  tech- 
nique  is  a  /2  and  the  standard  error  is  o//F  ,  whereas  the  variance  and 
standard  error  using  the  customary  technique  was  2a  and  a/2  ,  respec¬ 
tively.  Therefore,  Yates  weighing  technique  improved  the  precision  of 
the  estimated  weights  as  compared  to  the  customary  weighing  technique 
without  increasing  the  number  of  weighing  operations  (both  techniques 
required  eight  weighing  operations). 


This  illustration  by  Yates,  whereby  the  precision  of  an 

,  ,  *  c 

estimated  weight  was  increased  (without  additional  weighings)  by 
weighing  the  objects  in  combinations  rather  than  individually,  was 
the  origin  of  the  weighing  problem. 
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CHAPTER  II 


HOTELLING’S  ENUNCIATION  OF  THE  WEIGHING  PROBLEM 

Hotelling  [5]  presented  a  further  Improvement  by  suggesting 
that  Yates'  procedure  be  modified  by  placing  in  the  other  pan  of  the 
scale  those  objects  not  included  in  each  weighing  as  specified  by  Yates. 
For  example,  in  Yates'  procedure  (see  Chapter  I)  one  weighing  of  the 
seven  objects  had  the  combination  of  a  +  b  +  d  =  Yp  which  represented 
the  objects  a,  b,  and  d  being  weighed  on  the  scale.  Hotelling  suggested 
that  in  addition  to  objects  a,  b,  and  d  being  placed  in  one  pan  for 
weighing,  the  remaining  objects,  i.e.  c,  e,  f,  and  g,  be  placed  in  the 
other  pan.  Hotelling's  weighing  design  for  the  determination  of  the 
weights  of  seven  light  objects  is  presented  in  Figure  2.  Using 
Hotelling's  procedure,  the  estimated  weights  of  each  of  the  seven 
light  objects  would  be  given  as  follows: 

H1  +  W2  +  V  W4  '  W5  -  W6  '  W7  ‘  W8 
a  = .  ~  —  ■  v - 

8 

W1  +  W?  +  h5  +  Wg  -  W3  -  W4  -  W7  -  Wg 

D  *  - 

8 

.  .  +  “3  *  “5  +  U7  -  “z  -  "4  '  *6  -  «8 

8 


-39- 


X_ 


*;$•**,'  -x., 


Weighing  Number 


Objects  Weighed 


Scale  Reading 


1  a  +  b  +  c  +  d  +  e  +  f  +  g  W, 

2  a+b-c+d-e-f-g  W2 

3  a-b  +  c-  d  +  e-  f-  g  Wg 

4  a-b-c-d-e  +  f  +  g  W4 

5  i-a  +  b  +  c-  d-  e  +  f-  g  Wg 

6  -a+b-c-d+e-f+g  Wg 

7  -a-b+c+d-e-f+g  W^ 

8  -a-b-c+d+e+f-g  Wg 

Figure  2.  Hotelling's  Weighing  Design 
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W]  +  W2  +  W7  +  W8  -  W3  -  W4  -  W5  -  W6 
8 


e  * 


«!  +  W3  +  W6  +  W8  -  W2  *  W4  '  W5  "  W7 


f  * 


+  W4  +  W5  +  W8  ‘  W2  -  W3  *  W6  -  W7 


W1  +  w4  +  w6  +  w7  -  w2  -  w3  -  w5  -  w8 


The  variance  of  each  unknown  weight  by  Hotelling's  method  is  therefore 
o  /8  and  the  standard  error  is  a//jT  .  This  standard  error  is  half  that 
of  Yates  method  and  ?.  fourth  of  the  value  determined  by  the  customary 
method.  Also  in  Hotelling's  method,  as  in  Yates'  method,  the  scale 
bias  (zero  correction)  gets  subtracted  out  in  the  equations  for  the 
determination  of  the  unknown  weights. 


The  design  principle  inherent  in  both  Yates'  and  Hotelling's 
method  may  be  illustrated  even  better  with  reference  to  a  simpler 
example.  Let  us  suppose  that  it  is  required  to  find  the  unknown 
weights  of  two  objects  a_  and  b^,  and  that  the  scale  to  be  used  is 
already  corrected  for  bias.  If  the  two  objects  are  weighed  together 
in  one  pan  of  the  scale,  and  also  in  opposite  pans,  the  equations  for 
the  unknown  weights  will  be 

a  +  b  =  Yj,  a  -  b  =  Y2, 

where  Y-j  and  Y2  denote  the  readings  from  the  scale.  From  the  above. 
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we  get 


a  ■ 


V»2 


and 


If  o2  is  the  variance  of  an  individual  weighing,  the  variance  of  a  and 
b,  by  this  method,  is  obtained  as  o2/ 2.  The  error  for  both  the  esti¬ 
mates,  therefore,  is  c/fF.  Thus,  with  only  2  weighing  operations,  it 
has  been  possible  to  obtain  the  standard  error  for  both  the  objects 
as  o/vF;  whereas  if  the  objects  were  weighed  separately  twice  each, 

4  weighing  operations  would  have  been  needed  in  all  to  obtain  this 
standard  error  for  both.  Weighing  the  objects  in  combination  has, 
therefore,  saved  the  trouble  of  making  weighing  operations  by  half 
the  number. 


The  above,  therefore,  amply  illustrates  the  following  quotation 
due  to  Hotelling:  "When  several  quantities  are  to  be  ascertained 
there  is  frequently  an  opportunity  to  increase  the  accuracy  and  reduce 
the  cost  by  suitably  combining  in  one  experiment  what  might  ordinarily 
be  considered  separate  operations". 

In  addition  to  the  improvement  In  Yates'  method.  Hotelling 
also  gave  a  precise  formulation  of  the  weighing  design  problem.  This 
formulation,  as  later  pointed  out  by  Banerjee  [l],  may  be  interpretated 
as  follows: 


Results  of  N  weighing  operations  to  determine  the  individual 
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weights  of  p  light  objects  fit  In  to  the  general  linear  hypothesis 

c 

model,  Y  «  X3  +  c,  where  Y  Is  an  N  x  1  random  observed  vector  of  the 
recorded  weights;  X  3  (xjj)»  1  “  1,  2,  .  .  .  N;  j  ■  1,  2,  .  .  p.  Is 
an  N  x  p  matrix  of  known  quantities,  with  *  +1,  -1  or  0,  If,  In 
the  1  weighing  operation  the  j  object  Is  placed  respectively  In  the 
left  pan,  right  pan  or  In  none;  $  Is  a  p  x  1  vector  (p  s  N)  represent¬ 
ing  the  weights  of  the  objects;  e  is  an  N  x  1  unobserved  random  vector 
such  that  E  (e)  3  0  and  E  (ce*)  3  o2IN. 

Consistent  with  the  signs  that  the  elements  x^j  can  take,  the 
record  of  the  weighing  Is  taken  as  positive  or  negative,  according 
as  the  balancing  weight  is  placed  in  the  right  pan  or  left. 

The  matrix  X  Is  called  the  "design  matrix".  When  X  is  of 
full  rank,  that  is,  when  [X'X]  is  non-singular,  the  least  squares 
estimates  of  the  weights  are  given  by  3  3  [X'X]”1  X'Y,  where  X'  Is  the 
transpose  of  X.  The  covariance  matrix  of  the  estimated  weights  is 
given  by  COV  (3)  =  o2C.  The  ith  diagonal  element  of  C,  c^,  represents 
the  variance  factor  for  the  1^  object.  The  objective  of  the  Weighing 
Design  Problem  is  to  obtain  the  design  matrix,  X,  such  that  the  c^ 
are  a  minimum. 

In  this  connection.  Hotelling  [5]  proved  the  following  Lemma: 

Let  A  3  [X'X]  3  (a.j),  1,  j  3  1,  2 . .  Then,  if  a12, 

a13»  •  •  •»  a]p  (  =  a2i*  a3]»  •  •  •»  api  respectively)  are  free  to 
vary  while  the  other  elements  of  A  remain  fixed,  the  maximum  value  of 
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JA|/A^  is  a^,  and  is  attained  when  and  only  when  312  «  »  .  .  .  ■ 

alp  "  w^ere  the  ^nor  A  obtained  by  deleting  the  first 

row  and  column.  ,  t_  «s  c  _  =,  -  ..s  ■  .  °  ; 

A 

From  the  above  Lemna,  it  is  evident  that  the  variance  of  8j, 
namely  o2A^/|Aj,  cannot  be  less  than  a2/ a^,  and  that  the  variance 
would  reach  this  value  only  if  the  experiment  is  so  arranged  that  the 
elements  after  the  first  row  and  column  of  A  are  all  zero.  This 
minimum  value,  a2/a^,  will  be  attained,  when  the  first  column  of  X 
is  orthogonal  to  all  the  others.  It  will  also  be  clear  that  the 
minimum  minimorum  [5]  of  the  variance  will  be  reached,  if  the  first 
column  of  X  is  not  only  orthogonal  to  all  the  others,  but  also  if  it 
consists  entirely  of  +Ts  and  -1's  as  its  elements,  so  that  a^  =  N. 

N  is  the  maximum  possible  value  that  a^  can  take.  The  value  of  this 

minimum  minimorum  will  thus  be  equal  to  a2/ N. 

i 

1 

It  Is  evident  from  the  Lenma  and  the  above  discussion 
that  this  minimum  minimorum  of  the  variance  would  be  reached  in  respect 

i  c  c 

A  T 

of  all  the  estimates  S-,  (1  i  =  1,  2,  .  .  .,  p),  if  the  design  matrix  X 
is  orthogonal  in  the  sense  that  [X'X]  is  diagonal  with  N  on  the 
diagonal.  i 


/ 
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CHAPTER  III  '  ”  ' 

TWO  TYPES  OF  WEIGHING  PROBLEMS  (SPRING  AND 

«  '  CHEMICAL  BALANCE)  AND  ILLUSTRATIONS  OF  EACH 

In  general,  there  are  .two  distinct  types  of  weighing  problems. 
These  problems  have  been  designated  as  the  spring  balance  weighing 
problem  and  the  chemical  balance  weighing  problem.  In  the  spring 
balance  problem  (only  one  pan  is  used)  the  design  matrix  X  is  composed 
of  elements  which  can  assume  only  the  values  of  +1  or  0,  where 
+1  denotes  that  the  object  is  to  be  pla-.ed  in  the  pan  and  a  0  denotes 
that  the  object  is  not  to  be  placed  in  the  pan.  In  the  chemical 
balance  problem  (two  pans  are  used),  the  design  matrix  X  is  composed 
of  elements  x^  •  which  can  assume  values  of  +1,  -1,  or  0,  where  +1 
denotes  that  the  object  be  placed  in  the  left  pan,  -1  denotes  that 
the  object  be  placed  in  the  right  pan  and  0  denotes  that  the  object 
not  be  placed  on  either  pan.  It  is  evident  that  the  design  origi¬ 
nally  proposed  by  Yates  (see  Chapter  I)  was  a  spring  balance  design, 
and  the  improved  design  proposed  by  Hotelling  was  really  a  chemical 
balance  design  (see  Chapter  II). 

Example  of  a  Spring  Balance  Weighing  Design 

As  an  example  of  a  spring  balance  weighing  design,  consider 
the  problem  of  determining  the  weights  of  three  objects  (a,  b,  and  c). 
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A  spring  balance  weighing  design  matrix  would  be: 


X  - 


1  1  0 
1  0  1 
0  1  l| 


Where  the  columns  refer  to  the  objects  and  the  rows  refer  to  the 

c  1 

weighing  operation.  For  this  example  the  first  weighing  would  have 
objects  a  and  b  on  the  scale.  Object  c  would  not  be  used  in  the  first 
weighing  operation.  The  second  weighing  operation  would  have  objects 
a  and  c  being  weighed  together.  Object  b  would  not  be  used  in  the 
second  weighing  operation.  Likewise,  the  third  weighing  operation 
would  have  objects  b  and  c  being  weighed  together.  Object  a  would  not 
be  used  in  the  third  weighing  operation. 


The  least  squares  estimates  of  the  unknown  weights  are  given 
by  8  *  [X'X]~^X'Y  and  the  covariance  of  the  unknown  weights  are  given 
by  Cov  (e)  *  (X*X)“V2,  where  X  is  the  weighing  design  matrix  and  Y 
Is  the  matrix  of  recorded  weights  of  each  weighing  operation.  For 

'  c  ’c  • 

thi s  exampl e 


X  = 


1  1  0 
1  0  1 
0  1  1 


[X'X] 


2  1  1 
1  2  1 
1  1  2 
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and 


3/4  -1/4  -1/4 
[X'X]"1  -  -1/4  3/4  -1/4 
-1/4  -1/4  3/4 


For  the  least  squares  estimates, 


1/2  1/2  -1/2  Y1 

[X'X]-Vy  -  1/2  -1/2  1/2  Y2 

-1/2  1/2  1/2  Y3 


3. 

2 

2 

The  variance  of  each  unknown  weight  is  3/4  o  (as  given  by  the  diagonal 
elements  of  (X'X)"1). 


Example  of  Chemical  Balance  Weiahinq  Design 


As  an  example  of  a  chemical  balance  weighing  design  consider 
the  problem  of  determining  the  weights  of  four  objects  (a,b,c,  and  d). 
A  chemical  balance  weighing  design  matrix  would  be: 


Till 
-1  1-1  1 
-1-1  1  1 
1-1-1  1 


As  in  the  spring  balance  example,  the  columns  refer  to  the  objects  to 
be  weighed  and  the  rov/s  refer  to  the  weighing  operation.  In  this  case, 
the  first  weighing  operation  would  have  all  four  objects  (a,  b,  c,  and 
d)  placed  in  the  left  pan.  The  second  weighing  operation  would  have 
objects  b  and  d  placed  In  the  left  pan  and  objects  a  and  c  In  the 
right  pan.  The  third  weighing  would  have  objects  a  and  b  placed  In 
the  right  pan  and  objects  c  and  d  in  the  left  pan.  Likewise,  the 
fourth  operation  would  have  objects  b  and  c  placed  in  the  right  pan 
and  objects  a  and  d  placed  in  the  left  pan.  Using  the  same  notation 
as  in  the  spring  balance  example,  we  have 


1111 
-1  1  -1  1 
-1-1  1  1 
1-1-1  1 


[X'X] 


4  0  0  0 
0  4  0  0 
0  0  4  0 
0  0  0  4 


and 
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CHAPTER  IV 

,  o 

r  r  '  r  c  -  „  t  .  ;  ‘  C 

HOOD  ANO  KEMPTHORNE'S  CONTRIBUTIONS 

°  Mood  [7]  has  Indicated  that  if  N  weighing  operations  are  made 

to  determine  the  weights  of  N  objects,  the  minimum  variance  that  the 
estimated  weights  could  have  is  c2/N;  and  further,  this  minimum  vari¬ 
ance  will  be  reached  only  when  the  design  matrix  is  orthogonal 
(orthogonal  in  the  sense  that  (X'X)  is  diagonal)  with  elements  con¬ 
sisting  entirely  of  +l's  and  -l's.  Thus,  Mood  [7]  showed  that  the 
problem  of  finding  the  best  chemical  balance  design  is  related  to 
Hadamard  matrices  and  the  Hadamard  determinant  problem. 

The  theorem  that  Hadamard  proved  is  as  follows: 

If  the  elements  of  a  square  matrix  X  are  restricted  to 
the  range  -1  <  x^  <  1,  the  maximum  possible  value  of  the  determinant 
of  X  is  NN/,2t  and  when  this  maximum  value  is  achieved,  all  x..  =  ±1. 

•  J 

The  matrix  X  also  will  be  orthogonal  in  the  sense  that  (X'X)  will  be 
diagonal  with  all  non-zero  elements  equal  to  N. 

Such  matrices  are  denoted  by  H^.  If  exists  for  a  given 
N,  H^  Is  the  best  chemical  balance  design  for  N  =  p.  If  the  number  of 
objects,  p,  to  be  weighed  is  less  than  N,  the  best  design  is  one  which 
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Is  derived  from  by  selecting  a  number  of  columns  equal  to  the 
number  of  objects  to  be  weighed. 


Hood  has  further  pointed  out  the  work  of  Paley  [8]  and 
Williamson  [9]  and  showed  that  H4((  exists  for  the  range  of  0  <  4k  <  100 
with  the  possible  exception  of  4k  ■  92.  The  solution  for  4k  ■  92  was 
later  found  by  Baumert,  Golomb,  and  Hall  Jr.  [3]. 

In  suninary,  for  those  chemical  balance  designs  where  an 
exists,  the  determination  of  the  optimum  design  is  completely  solved; 
l.e.  the  optimum  chemical  balance  design  will  be  H^,  If  exists. 

For  the  remainder  of  this  dissertation  we  shall  devote  our 
attention  to  the  spring  balance  problem.  As  mentioned  before,  the 
spring  balance  problem  differs  from  the  chemical  balance  problem  In 
that  the  elements  of  the  design  matrix  X  can  only  assume  values  of  +1 
or  0,  whereas,  in  the  chemical  balance  design  the  elements  of  the 
design  matrix  X  can  assume  values  of  +1,  -1,-or  0. 


We  notice  that  Hadamard's  theorem  does  not  directly  apply  to 
the  spring  balance  problem  since  the  design  matrix,  X,  can  only 
assume  values  of  +1  and  0.  For  the  spring  balance  problems,  where 
N  *  p  and  N  =  3  (mod  4),  Mood  showed  that  the  best  possible  spring 
balance  design  is  determined  by  +  if  +  ^  exists.  Mood's 
method  of  construction  of  these  best  possible  designs  is  as  follows: 

Let  Kjj  +  j  denote  a  matrix  formed  from  +  ^  by  adding  or 
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subtracting  the  elements  of  the  first  row  of  +  j  from  the  correspond 
Ing  elements  of  the  other  rows  In  such  a  way  as  to  make  the  first 
element  of  each  of  the  remaining  rows  zero.  Obviously, 

I  +  1 1  *  4  I HN  +  1 1  * 

Except  for  the  first  row,  the  elements  of  +  ^  are  0  and  ±  2  with 
the  signs  of  the  non-zero  elements  being  the  same  for  elements  in  the 
same  row.  Let  be  the  matrix  obtained  by  omitting  the  first  row 
and  column  of  +  ^  by  changing  all  non-zero  elements  to  +1,  and  by 
permuting  two  rows,  if  necessary,  to  make  the  determinant  of 
positive.  Then 

Ihn  ♦  i I  ■  2nIlnI 


It  is  clear  that,  given  L^,  one  could  reverse  the  procedure  and  deter¬ 
mine  an  +  -j.  In  the  same  manner,  there  is  a  correspondence  in 
general  between  square  matrices  with  elements  ±1  and  square  matrices 
of  one  less  order  with  elements  0  and  1.  The  ratio  of  the  values  of 
corresponding  determinants  is  always  2  ,  if  their  determinants  do  not 
vanish;  hence  the  (0,  l)-determinant  will  always  have  its  maximum 
value  when  its  corresponding  (+1,  -1) -determinant  has  its  maximum 

c  '  C  ' 

possible  value.  Thus,  |L^|  is  the  maximum  value  possible  for  a  deter¬ 
minant  of  0's  and  l's  of  order  N,  and  the  value  of  |LN|  is 

l«-N!  HN  +  1)  **(N  *  }) 

ZH 

j  s  7 

The  variances  of  the  estimated  weights  will  be  a  =  o24N/(N+l).  We 
knew  in  advance  that  a^1  would  be  greater  than  1/M ,  since  an  optimum 


X 


design  cannot  exist  unless  the  design  matrix  has  its  elements  equal 
±1  and  the  spring  balance  design  Is  restricted  to  elements  +1  or  0. 

For  the  spring  balance  problems  when  N  >  p.  Mood  has  presented 
the  following  approach  for  obtaining  optimum  spring  balance  designs: 


Let  Pr  be  a  matrix  whose  rows  are  all  the  arrangements  of  r 
ones  and  (p  -  r)  zeroes  (Os  r  s  p).  (The  symbol  should  also  have  a 
subscript  p  but  that  is  omitted  because  any  specific  value  for  p  will 
always  t>e  clear  from  the  context.)  The  matrix  will  have  p  columns  and 
^^rows.  Let  X  be  a  matrix  made  up  of  matrices  Pr  arranged  In  verti¬ 
cal  order.  Let  np  be  the  number  of  times  Pr  is  used  in  constructing  X. 
The  matrix  X  is  then  a  weighing  design  for  p  objects  and  N  =  r  nr^r^* 
Using  these  notations.  Mood  has  proven  the  following  two  theorems  giv¬ 
ing  the  best  spring  balance  designs: 

Theorem  (1):  If  p  *  2k  -  1,  where  k  is  a  positive  integer,  and  if  N 
contains  the  factor  then  (a^|  (det.  |A|)  will  be  maximized  when 

nk  =  N/,(k)  and  other  nr  =  0* 

Theorem  (2):  If  p  =  2k,  where  k  is  a  positive  integer,  and  if  N  con¬ 
tains  the  factors  ^  then  [a^j[  (det.  |A|)  will  be  maximized 

when  nk  =  nk+1  =  N/^  *  a.  d  all  other  np  =  0. 


a.  d  all  other  n„  =  0. 

r 


When  p  is  odd.  Mood  observed  that  P^  is  a  design  which  not 
only  minimizes  the  confidence  region  for  estimating  the  weights,  but 
also  minimizes  the  individual  variance  factors.  When,  however,  p  is 
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even.  Mood  observed  that  the  variance  factors  may  not  be  the  minimum, 
and  makes  a  surmise  that  the  best  design  from  the  point  of  view  of 
minimum  variance  factors  would  be  made  up  largely  from  P^  and  a  small 
proportion  from  P^  +  j. 

Kempthorne  [6]  discussed  the  weighing  problem  from  the  point 
of  view  of  factorial  experiments  and  in  [6]  has  given  rules  by  which 
the  fractional  designs  may  be  constructed.  Kempthorne  has  indicated 
that  the  fractional  designs  have  the  following  properties: 

(1)  The  design  automatically  takes  care  of  any  bias  in  the 

balance. 

(2)  The  effects  or  weights  may  be  easily  computed. 

(3)  The  effects  or  weights  are  uncorrelated. 

(4)  All  the  weights  are  measured  with  the  same  precision. 

(5)  An  estimate  of  the  experimental  error  which  is  Independent 
of  the  effects  may  be  computed  from  the  results. 

Kempthorne  also  compared  his  fractional  designs  with  the  designs 
proposed  by  Mood  and  has  found  that  the  fractional  factorial  designs 
will  yield  estimates  which  have  a  somewhat  higher  variance  than  Mood's 
designs.  Kempthorne  also  Indicated  that  the  increase  in  precision:  in 
Mood's  designs  had  been  obtained  at  the  expense  of  having  correlated 
estimates  which  are  subject  to  any  bias  that  the  measuring  instrument 
may  have.  For  these  reasons,  Kempthorne  doubted  whether  the  use  of 
Mood's  designs  for  any  practical  problem  could  be  justified. 


Subsequent  to  this  remark  by  Kempthome,  Banerjee  [l  ]  shewed 
that  the  designs  of  Mood  are  a  special  class  of  symmetrical  balanced 
Incomplete  block  designs  and  that  the  designs  could  be  easily  modi¬ 
fied  to  provide  orthogonal  estimates  as  was  referred  to  by  Kempthorne. 

In  addition,  since  the  designs  are  a  subset  of  symmetrical  balanced 
incomplete  block  designs,  Banerjee  [2]  developed  a  general  method  to 
show  how  BIBD's  in  general  could  be  made  to  provide  orthogonal  estimates 
when  used  as  weighing  designs.  A  detailed  description  of  Banerjee's 
method  is  presented  in  the  next  chapter. 
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CHAPTER  V 


BANERJEE 'S  METHOD  OF  MODIFYING  BIBD  TO  PROVIDE 
ORTHOGONAL  ESTIMATES  IN  WEIGHING  DESIGNS 

As  was  mentioned  In  Chapter  IV,  Kempthorne  noted  that  although 
the  optimum  designs  for  the  spring  balance  problem  suggested  by  Mood 
furnish  somewhat  smaller  variance  than  that  given  by  fractional  repli¬ 
cates,  Mood's  designs  have  the  disadvantage  that  the  estimates  are 
correlated,  whereas  the  estimates  furnished  by  fractional  replicates 
are  orthogonal.  Banerjee  [  l]  has  shown  that  the  optimum  designs  of 
Mood  may  also  be  made  to  furnish  orthogonal  estimates  when  the  designs 
are  adjusted  to  suit  estimation  in  a  biased  spring  balance.  Since  the 
optimum  designs,  L^,  of  Mood  are  a  special  class  of  a  symmetrical 
BIBD's  a  question  arises  if  it  would  be  possible  to  provide  by  a 
similar  type  of  an  adjustment,  orthogonal  estimates  when  BIBD's  are 

c  ,  l 

used  as  spring  balance  weighing  design.  A  complete  detailed  procedure 
indicating  how  balanced  incomplete  block  designs,  in  general,  may  be 
made  to  furnish  orthogonal  estimates  in  weighing  designs  was  presented 
by  Banerjee  [2].  Banerjee's  procedure  is  presented  in  the  following: 

Usually  v  denotes  the  number  of  varieties  and  b  denotes  the  num¬ 
ber  of  blocks  in  a  balanced  incomplete  block  design.  However,  in  weigh¬ 
ing  designs,  v  will  be  used  to  denote  £he  number  of  objects  to  be  weighed 


For  the  determination  of  the  variances  and  covariances,  we  need 
to  determine  the  inverse  of  [X'Xj.  The  diagonal  elements  in  the  in¬ 
verse  matrix,  [X'X]~\  represent  the  variance  factors  and  will  all  be 

.  c 

equal  to 

a2 . 

The  off-diagonal  elements  represent  the  .covariance  terms  and  will  all 
be  equal  to 

a2. 

Since  these  off-diagonal  elements  are  not  zero,  we  see  that  the  esti¬ 
mates  are  correlated. 


- - - -  '  *■-  r  ~y  . 

Banerjee  [2]  has  suggested  the  following  procedure  for  modify¬ 
ing  the  BIBO  to  provide  orthogonal  estimates.  Taking  the  bias  as  an 
additional  object  to  be  weighed,  a  column  of  ones  and  only  one  row  of 
zeroes  In  that  order  may  be  added  to  the  BIBO  design  matrix  to  corre¬ 
spond  to  the  bias  assumed  as  an  additional  object.  The  modified  design 
will  then  be  suitable  for  the  estimation  of  the  weights.  This  really 
means  to  make  one  additional  weighing  to  obtain  an  estimate  of  the  bias 
and  In  the  subsequent  weighings  the  bias  will  automatically  be  included. 
If,  however,  a  column  of  ones  and  t  rows  of  zeroes  are  added  to  the 
design  matrix,  this  implies  that  t  weighings  will  be  devoted  to  the 
estimation  of  the  bias.  In  such  a  situation,  the  matrix  [X’X]  will 
be  of  the  form. 


Cx*x] 


b  +  t  r  r  r 

r  r  x  x 

r  x  r  x 


r  xxx 

Because  of  the  Inclusion  of  the  bias,  the  order  of  this  matrix  is 
(v  +  1}  x  (v  +  1). 


We  would  like  to  obtain  the  Inverse  of  this  matrix  in  the 
following  manner: 

The  determinant,  |X'X|,  B  (r  -  x)v  ^  [(b  +  t)  {r  +  x(v  -  1)}-  v  ]. 


On  simplifying  we  obtain, 

jX'X|  ■  (r  -  X)v  "  1  [t  {r  +  X  (v  -  1)}  +  b  {r  +  X  (v  -  1)}  -  r2  v] 

■  (r  -  x)v  ‘  1  [t  {  r  +  X  (v  -  1)}  +  b  {r  +  r  (k  -  1)}  -  r2  v] 

■  (r  -  x)v  "  1  [t  { r  +  X  (v  -  1)}  +  b  k  r  -  r2  v] 

■  (r  -  x)v  -  1  tt|r  ♦  X  (v  -  1))] 

.  t  (r  -  x)v  '  1  jrt  x  (v  -  1)1  (1) 

In  a  similar  fashion,  the  value  of  the  determinant  obtained 
after  suppressing  the  first  row  and  the  first  column  of  [X'X]  can  be 
shown  to  be 

(r  -  X)v  •  1  {r  +  X  (v  -  1)}  .  (2) 

The  value  of  the  determinant  after  suppressing  the  second  row  and 
second  column  of  Qx'x]  is 

(r  -  X)v  ’  2  [(b  +  t)  {r  +  X  <v  -  2) }  -  rZ  (v  -  1)]  (3) 

The  value  of  the  determinant  obtained  after  suppressing  the  first  row 
and  second  column  of  [X'X]  is 

r  (r  -  x)v  “  1  (4) 

c 

The  value  of  the  determinant  obtained  after  suppressing  the  second 
row  and  the  third  column  of  [x'x]  is 

(r  -  x)v  ‘  2  {x(b  +  t)  -  r2}  (5) 

For  any  two  estimates  to  be  orthogonal,  the  off-diagonal 
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elements  of  [X'X]  corresponding  to  those  estimators  must  be  equal  to 
zero.  Keeping  the  bias  out  of  consideration  we  would  like  expression 
(5)  to  be  equal  to  zero.  That  is,  to  obtain  orthogonal  estimates,  the 
value  of  t  must  be  chosen  such  that 

-  ,  co„ 

.  •  '  f’  .  Cp  Q  '  -  .  i  c' 

(r  -  x)v  "  2{x(b  +  t)  -  r2}*  0 
Since  r  cannot  be  equal  to  x,  the  expression 


A(b  +  t)  -  r2  ■  0, 


Using  this  value  for  t,  the  matrix  [X’X]“7  becomes: 


This  expression  shows  that,  except  for  the  bias,  the  other  estimates 
are  mutually  orthogonal.  The  estimates  given  by  $  =  £X'XJ-^  X'Y  will 
be  given  as: 


a  -  [x'x]"1  x'Y  * 


1 

t 

1 

t 

1 

t 

0 

0 

0  . 

1 

1 

1 

1 

1 

1 

"  tk 

‘IF 

‘IF 

r 

c  r  . 

r  « * 

1 

1 

l 

1 

1 

1 

"  tF 

•  tic 

r 

‘IF 

‘IF 

‘IF  * 

[■] 


Given  a  balanced  Incomplete  block  design  with  parameters  v,  b, 
r,  k  and  A,  it  is  always  possible  to  obtain  another  balanced  Incomplete 
block  design  with  parameters 

v0  “  v 

b0  “  b 
rQ  »  b  -  r 

kQ  =  v  -  k 

Ag*b-2r+A  . 

The  two  balanced  incomplete  block  designs  derived  in  such  a  fashion 
are  said  to  be  complementary  to  one  another. 


When  an  Integral  value  of  t  (t  =  —  -  b)  is  available  it  is 
also  possible  to  obtain  orthogonal  estimates  in  the  complementary 
design.  This  is  done  by  adding  a  column,  of  ones  and  t  rows  of  ones 
in  that  order  to  the  design  matrix  X.  The  matrix  [X'X]  will  then  have 
the  form. 


third  column  of  [X'X]  In  this  case  will  be  given  by: 

C(rQ  +  t)  -  (A0+  t}]v  "  2  (aq  +  t)  (b0  +  t)  -  (rQ  +t)2 

To  have  orthogonal  estimates,  for  all  the  weights  except  for  the  bias, 
this  quantity  must  be  zero,  l.e. 

I>0  -  Ao]V  "  2  (A0  +  V  <b0  +  *>  -  (ro  +  *)2  3  0 


*ere  ♦- I  {,  *1*4^11}  . 

Ill  be  given  by 


The  estimated  weights  of  the  objects 


[X'X]"1  X'Y  * 


c  -  -  r  c  . 


.  V 

.  V 

,  V 

*0 

k° 

♦  ’  r 

♦  •  r 

*  ‘  r 

♦  ■  T 

♦  -  —  • 

•  • 

1 

1 

1 

1 

1 

tJT 

tJT 

nr 

r 

‘  r  * 

•  • 

1 

1 

i 

1 

1 

tic 

•  •  • 

tJT 

•  •  • 

nr 

•  •  • 

r 

•  t  •  • 

nr  • 

' 

•  • 

•  • 

•  •  • 

•  •  • 

•  •  • 

•  •  •  « 

•  •  • 

•  •  •  « 

•  •  •  • 

•  •  •  • 

■ 

•  • 

•  • 

It  may  be  noticed  that  the  variance  factors  for  the  objects  in 
the  complementary  designs  are  the  same  as  those  of  the  original  design. 
It  may  be  further  noticed  that  for  the  designs  suggested  by  Mood, 

t  *  1,  as  was  pointed  out  before.  This  means  that  orthogonal  estimates 

i 

may  be  obtained  for  LN,  by  adding  one  column  of  ones  and  only  one  row 
of  zeroes  to  the  design  matrix.  As  Banerjee  [  1  ]  has  shown,  this 

I  ‘ 

modified  LN  design  is  identically  the  same  as  given  by  Kempthorne's 
fractional  replication  designs. 

The  above  procedure  will  fail  to  furnish  orthogonal  estimates 
when  r  /X  is  not  an  integral  number,  i.e.  when  t  will  not  be  an  inte¬ 
gral  number.  For  these  situations  Banerjee  has  suggested  the  following 
procedure  for  determining  orthogonal  estimates.  Let  £  be  the  least 
positive  integer  such  that  (r  +  0  is  divisible  by  (X  +  c).  Then,  if 
a  column  of  ones  and  e  rows  of  ones  and  n  rows  of  zeroes  in  that  order 
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are  added  to  the  design  matrix  to  suit  the  estimation  of  the  bias  in  a 
biased  spring  balance,  the  matrix  [X'XJ  will  become: 


[X'X] 


b+n+5  r+£  r+c  ...  r+t 

r+e  r+e  A  +  e  ...  X  +  ? 

r+c  A  +  £  r  +  £  ...  X  +  £ 


Lr+  «  x+  «  x  +  «  •  •  •  r  +  £j 

The  value  of  the  determinant  |X'X|  is  given  by 
|X'X|  «  (r  -  A)v  (b  +  £  +  n} 

The  value  of  the  determinant  obtained  after  suppressing  the 
first  row  and  first  column  of  £X 1 X]  is  given  by 


(r  -  A)v  “  1  |r  +  A  (v  -  1)  +  £'  vj  . 


The  value  of  the  determinant  obtained  after  suppressing  the 
second  row  and  second  column  of  [X'Xj  is  given  by 

(r  -  A)v  *  1  (b  +  £  +  n)  . 


The  value  of  the  determinant  obtained  after  suppressing  the 
first  row  and  second  column  is  given  by 

(r  +  «)  (r  -  A)v  '  1 

The  value  of  the  determinant  [X'X]  obtained  by  suppressing  the 


second  row  and  third  column  will  be  equal  to 


I 


1  I 


Ctr  ♦'  {)-  (»  +  ?)f  '  2  C(6  +  {  +  n)  Cl  ♦  c)  -  tr  +  ?)2] 
Setting  this  expression  equal  to  zero,  we  obtain 


(b  +  c  +  n)  (X  +  0  -  (r  +  0  or 
{b  +  e  +  n)  -  frv  I)"  °r 
n  ■  4  r  f1-  -  (b  ♦  £)  . 


X  +  € 

Hence,  the  value  of  n  Is  determined. 


Using  these  values  the  matrix  [X'X]*^  will  reduce  to  the 
following  form, 


cx'xr 


A 

-  C 

-  C 


-  C  -  C 
1 


r  -  x 
0 


0 

1 


-C 

0 


r  -  X 


-  C  0 


*  r  -  X 


where 


and 


a  .  r  * x  (y  v1! +  K  y 

R  (b  +  l  *  n)  (r  - x) 


r  -  r  +  ; 

C  (b  +  5  +  n)  (r  -  X) 


This  procedure  shows  that  a  more  general  class  of  BIBD  designs  may  be 


-65- 


!  .  I 


y 

c  -  c 

made  to  furnish  orthogonal  estimates  as  Indicated  by  the  off-diagonal 
elements  of  [X'X]“^  being  zero  except  for  the  first  row  and  column 
which  corresponds  to  the  bias,  and  not  the  objects. 


As  an  Illustration  of  the  above  development,  Banerjee  [2  ] 
presented  the  following  example-  The  design  matrix  X  in  an  (i.e. 
a  symmetrical  balanced  Incomplete  block  design  which  has  v  *  7,  b  *  7, 
r  *  4,  k  *  4,  and  X  ■  2)  Is  given  by 


0  10  10  1 
110  0  11 
0  0  1111 
10  0  110 
11110  0 
0  110  10 
10  10  0  1 


where  the  rows  refer  to  the  weighing  operations  and  the  columns  refer 

'  c  ■  c  _  c.  ■  -  :  *■  ' 

to  the  objects  bj,  b2,  .  .  .,  b^  to  be  weighed. 


Seven  small  copper  pieces  have  been  arbitrarily  chosen  for  this 
illustration.  The  results  of  weighings  (in  grams)  of  the  combinations 
of  objects  as  given  in  the  above  design  matrix  are: 


10.76251  =  Y-j 
7.83798  =  Y2 
6.11380  =  Y3 
12.07808  =  Y4 
8.90452  =  Y5 
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and 


10.63856  -  Yg 
12.80768  »  Y7 


Using  these  values,  and  the  normal  equations,  the  estimates  of 
the  seven  objects  are  given  by: 

b]  ■  5.85763 
t>2  «  3.52835 
b3  -  1 . 78600 
b4  -  1.94650 
b5  *=  1.64367 
bg  -  1.04843 
and 

b?  «  1.47520 


For  this  example  the  matrix  [X'X]  Is  of  order  7  x  7  and  Is  given 


as  follows: 


[X'X]  = 


4 

2 

2 

2 

2 

2 

2 


2  2  2  2 
4  2  2  2 
2  4  2  2 
2  2  4  2 
2  2  2  4 
2  2  2  2 
2  2  2  2 


2  2 
2  2 
2  2 
2  2 
2  2 
4  2 
2  4 
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The  Inverse  of  this  matrix,  i.e.  [X'X]"*  is 


7 

1 

1 

1 

1 

7  * 

1 

TF 

TF  c 

*  TF 

“  TF 

‘  TF 

“TF 

“TF 

1 

7 

1 

1 

1 

1 

1 

‘  TF 

TF 

'  TF 

“  TF 

_TF 

"  TF 

“  TF 

1 

1 

7 

1 

1 

1 

1 

"  TF 

‘  TF 

TF 

‘  TF 

"  TF 

“TF 

“TF 

1 

1 

1 

7 

1 

1 

1 

“  TF 

"  IF 

"  TF 

TF 

"TF 

’  TF 

“TF 

1 

1 

1 

1 

7 

1 

1 

“  TF 

“  TF 

“  TF 

’  TF 

TF 

"TF 

“TF 

1 

1 

1 

1 

1 

7 

1 

’TF 

“  TF 

“  TF 

'  TF 

"  TF 

TF 

"TF 

1 

1 

1 

1 

1 

1 

7 

"  To 

“  TF 

'  TF 

‘  TF 

-  TF 

"  TF 

TF 

This  matrix  shows  that  the  estimates  furnished  by  this  design  are  corre 
lated  (i.e.  the  off-diagonal  elements  are  not  zero'). 

By  utilizing  Banerjee's  technique  previously  presented  the  BIBD 
may  be  modified  to  provide  orthogonal  estimates.  For  this  example  t  = 

r^  16 

__  -  b  =  -^  -  7=1,  therefore  one  column  of  ones  and  only  one  row 
of  zeroes  (in  that  order)  must  be  added  to  the  3IBD  design  matrix  (Ly) 
to  obtain  orthogonal  estimates.  '  c  t 


This  modified  matrix  is  given  as  follows: 


c  *V 


c 

1 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

1 

0 

1 

*  f 

\  ■  <  f  c  e  ..  -  ,  * 

■  c 

! 

1 

0 

1 

1 

0 

0 

1 

1 

1 

. 

1 

0 

0 

0 

1 

1 

1 

1 

1 

!  X  - 

* 

i  ' 

1 

1 

0 

0 

1 

1 

0 

'  •  *  r  ; c  1 

1 

i 

t 

1 

0 

1 

1 

1 

1 

0 

0 

i 

i 

1 

1 

0 

1 

1 

0 

1 

0 

1 

! 

! 

f 

1 

1 

1 

0 

1 

0 

0 

1 

i 

i 

are: 


The  results  corresponding  to  these  eight  weighing  operations 


0.00101 

10.76379 

7.83990 

6.11580 

12.08000 

8.90543 

10.63998 

12.80998 


Y  = 


The  first  reading  (0.00101)  corresponds  to  a  measurement  with  no 
objects  on  the  scale. 

The  inverse  of  the  matrix  [X'X]  is  given  as: 
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rr'ti 

'Cl 


r* 

:  % 
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1 

1  1 

"  3* 

‘  ?  " 
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“  T 

■  X 

;•  •  "  vc' 

' 

1 

‘  T 

T 

I 

0 

0 
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0 

0  0 

0 

l 

‘  4 

0 

1 

7 

0 

0 

0 

0  0 

1 

‘  4 

0 

0  £ 

1 

7 

0 

0 

0  0 

•'  C 

[x'xj- 

1 

•  T 

0 

0 

0 

l 

7 

0 

0  0 

1 

"  T 

0 

0 

0 

0 

1 

7 

0  0 

1 

"  4 

0 

0 

0 

0 

0 

k  • 

; 

| 

i 

1 

‘  ? 

0 

0 

0 

0 

0 

0  l 

' 

This  matrix  shows 

that 

under 

the  modified  BIBO, 

the  estimates 

are 

uncorrelated  (i.e.: 

the 

off-diagonal  elements 

;  are 

zero 

),  except  for  the 

first  row  and  first  column  which  correspond 

to  the  scale  bias. 

The  estimates  of  the  scale  bias 

1 

and 

the  seven  weights  under 

this  modified  set-up  are: 

| 

bQ  *  0.001 01  (scale  bias) 

* 

>4 

-  1.94662 

bj  -  5.85790 

s5 

*  1.64354 

b2  *  3.52868 

A 

b6 

=  1 .04887 

b3  =  1.78558 

b? 

=  1.47576 

This  example  demonstrates 

how  a 

BIBO  can 

be  modified  to  produce 

j 

orthogonal  estimates  when  t  = 

-  b)  is  an  integer. 
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As  an  example  of  how  a  BIBO  may  be  modified  in  a  more  general 
situation  (whenever  t  is  not  an  integer),  consider  the  BIBO  where  v  » 

8,  b  ■  14,  r  *  7,  k  *  4,  and  X  »  3.  This  BIBO  will  not,  in  its  present 
form,  provide  orthogonal  estimates.  However,  orthogonal  estimates  may 

J.  7 

be  obtained  as  previously  mentioned.  For  this  example  t  s  y*  b  *  j 
(not  an  integer).  The  least  integer  5  such  that  (r  +  5)^  is  divisible 
by  (X  +  c)  is  1.  The  value  of  n,  y-y-y  -  (b  +  c),  also  equals  to  1  for 
this  example.  Using  these  values,  the  modified  weighing  design  is 
gi ven  as : 


~Y 


The  results  of  the  16  weighings  as  specified  by  this  design 
are  represented  by  the  vector  Y  and  are  given  as: 

18.91670 
0.00101 
13.12036 
12.49119 
10.32320 
10.32997 
5.79780 
y  .  6.42631 

8.59500 
8.58860 
12.07940 
10.76320 
11.08093 
6.83861 
8.15469 
7.83968 


The  matrix  [X'X],  corresponding  to  the  modified  BIBO,  will  have 
the  form: 


7 


16  8  8  8  8 

8  8  4  4  4 

8  4  8  4  4 

8  4  4  8  4 

8  4  4  4  8 

8  4  4  4  4 

8  4  4  4  4 

8  4  4  4  4 

8  4  4  4  4 


8 

4 

4 

4 

4 

8 

4 

4 

4 


8  8  8 
4  4  4 
4  4  4 
4  4  4 
4  4  4 
4  4  4 
8  4  4 
4  8  4 
4  4  8 


The  Inverse  of  this  matrix  is  given  as: 


9 

T¥ 

1 

*  s 

1 

'  S’ 

1 

■  s 

1 

■  s 

1 

■  S’ 

1 

"  S' 

1 

"  S' 

1 

-s 

1 

‘  S’ 

1 

¥ 

0 

0 

0 

0 

0 

0 

0 

l 

"  S' 

0 

1 

¥ 

0 

0  ■ 

0 

0 

0 

0 

1 

‘  ff 

0 

0 

T 

¥ 

0 

0 

0 

0 

0 

[X'X]’1  « 

1 

‘  S’ 

0 

0 

0 

1 

¥ 

0 

0 

0 

0 

1 

•  S’ 

0 

0 

0 

0 

1 

¥ 

0 

0 

0 

1 

■  s 

0 

0 

0 

0 

0 

1 

¥ 

0 

0 

1 

*  s 

0 

0 

0 

0 

0 

0 

1 

¥ 

0 

1 

'  s 

0 

0 

0 

0 

0 

0 

0 

1 

¥ 

A 

We  notice  from  this  matrix  that  the  estimates  furnished  by  this  design 

C  c 

will  be  uncorrelated  since  the  off-diagonal  elements  are  zero  (except 
for  the  first  row  and  first  column  which  correspond  to  the  scale  bias). 

Solving  the  normal  equations  we  get,  therefore,  the  following 
estimates  of  the  weights  of  the  objects: 

*  0.00132  (scale  bias) 
b1  =  5.85791 
b2  =  3.52807 
b3  =  1.78583 
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b4  *  1-94731 
bg  »  1.64365 
bg  -  1.04861 
b;  *  1.47471 

A 

bg  ■  1.62960 
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CHAPTER  VI 

• r-  c  .  r  c  ;  c  _  c  ° 

A  GENERALIZED  PROCEDURE  FOR  MODIFYING  BIBD  TO  PROVIDE 
ORTHOGONAL  ESTIMATES  IN  WEIGHING  DESIGNS 

For  some  balanced  incomplete  block  designs  the  procedure 
developed  by  Banerjee  [2]  fails  to  provide  orthogonal  estimates  when 
these  modified  BIBD  are  used  as  weighing  designs.  For  example,  in 
Fisher  and  Yates  Tables  [  4],  the  BIBD  with  reference  number  15  has 
v  •  10,  b  3  30,  r  •  9,  k  ■  3,  and  A  »  2.  These  values  produce  n  3  -  7 
For  these  situations,  i.e.,  whenever  a  negative  value  is  obtained  for 
n,  no  procedure  yet  exists  for  modifying  the  corresponding  BIBD  to 
provide  orthogonal  estimates  when  the  BIBD  is  used  as  a  weighing  design. 

The  remainder  of  this  dissertation  will  be  devoted  to  obtaining 
a  general  procedure  for  modifying  all  BIBD  designs  to  provide  orthog¬ 
onal  estimates.  Several  theorems  that  are  directly  related  to  this 
development  will  also  be  presented.  In  addition,  a  comparison  of  the 
merits  of  this  generalized  procedure  with  those  of  Banerjee's  procedure 
will  be  made. 

As  was  noted  by  Banerjee  [2],  if  a  colwn  of  ones  and  t  rows 
of  zeroes  in  that  order  are  added  to  the  BIBD  matrix,  the  matrix 
[X'X]  will  be  of  the  form: 


» i_y-' — sq 


b  ♦  t  r  r 
r  r  X 
r  X  r 


CX'X]  - 


r 

x 

x 


For  this  matrix  Banerjee  obtained  the  following: 

1.  |X'X|  -  t(r  -  X)v  '  1  |r  +  x(v  -  1)}  . 

2.  Value  of  the  determinant  obtained  after  suppressing  the 
first  row  and  the  first  column  of  [X ' xj  is 

(r  -  X)v  "  1  {r  +  x  (v  -  1)} 

3.  The  value  of  the  determinant  obtained  after  suppressing  the 
second  row  and  the  second  column  of  [X'X)  is 

(r  -  X)v  ”  2  [(b  +  t)  {r  +  A  (v  -  2)j  -  r2  (v  -  1)]. 

4.  The  value  of  the  determinant  obtained  after  suppressing  the 
first  row  and  the  second  column  of  [X’X)  is 

r(r  -  A)v  ‘  \ 

5.  The  value  of  the  determinant  obtained  after  suppressing  the 
second  row  and  the  third  column  of  [X'X]  is 

(r  -  a)v  "  2{x(b  +  t)  -  r2}. 

Banerjee  suggested  that  5  above  (which  would  correspond  to  the  off-diag¬ 
onal  elements  in  [X'X]-'*)  should  be  identically  zero  (except  for  the 
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If  k  >  1.  These  conditions  hold  for  BIBO'S.  Therefore  the  lemma  is 

r2 

shown  to  be  valid,  i.e.  t  ■  -  b  will  always  be  positive. 

The  above  lemma  guarantees  t  to  be  positive.  However  t  does 
not  necessarily  have  to  be  an  integer.  For  these  situations  Banerjee, 
as  was  noted  in  Chapter  V,  suggested  adding  one  column  of  ones,  n  rows 
of  zeroes,  and  £  rows  of  ones  in  that  order  to  the  BIBD  to  provide 
orthogonal  estimates,  where  n  ■  ^  ■  -  (b  +  0  and  £  is  the 

least  positive  integer  such  that 

is  an  integer  value.  This  procedure,  however,  does  not  guarantee 
n  to  be  positive  and  in  fact  we  have  shown  at  the  beginning  of  this 
chapter  where  n  is  negative.  We  shall  now  present  a  method  for 
modifying  all  BIBO's  to  produce  orthogonal  estimates  in  weighing 
designs. 

<•  :  ’  •  '  T  ,  :  f  •  C  .  '  ( 

If  Banerjee's  method  is  modified  by  adding  one  column  of  ones 
and  only  one  row  where  the  first  element  of  the  row  is  /t  and  all 
other  elements  are  zeroes,  we  obtain  the  modified  BIBD  design  as 


block  design.  This  is  consistent  with  the  assumption  as  made  with 
respect  to  a  spring  balance  design.  We  would  make  only  one  weighing 
operation  on  empty  pans  and  multiply  the  corresponding  reading  on  the 
scale  by  /t.  It  may,  however,  be  remarked  here  that  the  structure  of 
the  variance-covariance  matrix  for  e  will  also  undergo  a  corresoonding 
change,  i.e.  under  Banerjee's  procedure  E(ee’)  *  o2In  whereas  under 
this  new  procedure  E  [ee1]  *  Va2. 


Using  this  modification,  the  matrix  [X'X]  becomes: 


[X'X] 


b  +  t  r  r 

r  r  X 

r  x  r 


r 

x 

x\ 


(v+I)x(v+l) 


which  is  identically  the  same  as  that  when  t  rows  were  added  instead 


of  the  row,  (i^t  0  0  0  ...  0).  Since  the  [X'X]  matrix  Is  the  same, 
[X'xr1  will  also  be  the  same  and  therefore  will  provide  orthogonal 
estimates  as  before.  The  advantage  of  adding  only  one  row  (/t  0  0  .  .  . 
0)  to  the  BIBO  is  that  less  weighing  operations  are  required  and  that 
this  method  will  provide  orthogonal  estimates  for  all_BIBD's  since  t 
does  not  have  to  be  an  Integer.  For  example,  the  BIBD  with  reference 
no.  15  In  Fisher  and  Yates*  tables  [4]  is  given  by 

v  ■  10,  b  «  30,  r  *  9,  k  *  3,  and  \  *  2. 

Previously  for  this  BIBD  there  was  no  method  for  modifying  this  design 
to  provide  orthogonal  estimates.  Using  the  new  technique  we  get 

“T-&"f-30'T 


We  would  now  add  a  column  of  ones  and  a  row  of 


0  0  ...  0)  in 


that  order  to  the  original  BIBO  and  obtain  the  matrix  [X'X]  as  follows: 


9  9  9  9  9  9  9  9  9  9 


[X'X]  = 


9222222222 
2922222222 
22922222  2  2 
2229222  2  22 
2222922  222 
22  2229  2222 
2222229222 
2222222922 
2222222292 
2222222229 


Banerjee's  method  and  the  new  method  is  that  Banerjee's  method  requires 
t  additional  weighing  operations  whenever  t  is  an  integer  and  C  +  n 
additional  weighings  whenever  t  is  not  an  integer  whereas  the  new 
method  requires  only  one  additional  weighing  operation.  In  addition, 
the  variance-covariance  matrix  for  e  will  be  slightly  different  for 
Banerjee's  method  and  the  new  method.  Several  questions  arise  concern¬ 
ing  these  differences,  such  as,  "What  is  the  relative  efficiency  of 
the  new  procedure  as  compared  to  that  of  Banerjee's  procedure?"  "What 
implications  arise  since  the  variance-covariance  matrices  are  differ¬ 
ent?"  These  questions  will  be  addressed  in  the  following. 

Under  Banerjee's  previous  method  (adding  t  rows)  the  weighing 
design  model  had  the  form  Y  =  XB  +  e  with  E  [e]  =  0  and  E  [ee‘]  =  a2IR. 
Under  the  new  method  (adding  one  row,  /t  0  0  .  .  .0)  the  model  becomes 
Y  *  XB  +  e  with  E  [e]  =  0  and  E  [ee‘]  =  Va2  where  V  has  the  form: 


Since,  as  noted  above,  the  covariances  have  different  forms,  it  is 
appropriate  to  compare  the  relative  efficiencies  between  the  two 
methods.  Relative  efficiency  is  defined  in  this  case  as  the  ratio  of 
the  reciprocals  of  the  variances. 

First  to  aid  us  in  this  development,  the  following  theorem 


>>  - -  .  —  '  W  tg 

will  be  proven: 

Theorem:  In  the  estimation  procedure  under  consideration,  when 

r.  ■'  c  a  ^ 

E  [ee‘]  =  Vo2,  the  covariance  of  the  estimators,  Cov(g),  obtained  by 
the  least  squares  procedure  is  identical  to  the  covariance  of  the 
estimators  obtained  by  the  maximum  likelihood  procedure  even  when  X 
is  not  square. 

(We  know  that  for  the  estimation  procedure  under  consideration,  when 
the  design  matrix  X  is  square  the  covariance  of  the  estimators  obtained 
by  the  least  squares  procedure  is  identical  to  those  obtained  by  the 
maximum  likelihood  procedure.) 

Proof:  We  wish  to  show  that  L.S.  Cov(e)  =  M.L.  Cov(b) 
or 

I  =  M.L.  Cov(0)  [L.S.  Cov(a)]"1 

We  shall  now  determine  the  Cov(£)  by  the  least  square  method 
and  also  by  the  maximum  likelihood  method. 

1.  By  Least  Squares. 

£  *  (X'X)”1  X'Y 

Cov(e)  *  E  [(s  -  s)  (5  -  e)'] 

*  e|[(x' x)-1  x'Y  -  e]  [(x'x)"1  x'Y  -  e]'j 

*e{[(x'x)-1  x'  (xb  +  e)  -  b3C(x* x)"1  x*  (xe  +  e)-  e]'J 

■  E^(X'X)’1  X'e]C(X'  X}"1  X'e]  j 

*  r  [(X'X)”1  X'e  e'X  (X’X)"1] 

or  finally  *  o2  [(X'X)”1  X'VX  (X'X)”1] 
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2.  Maximum  likelihood  approach. 

»  The  maximum  likelihood  estimator  of  6  (assuming  e  is  normally 
distributed)  is  S  .  (X'V'V1  X'V'V 

Ccv(e)  =  E[(e  -  b)(b  -  b)'] 

«  E^C(X' V"1X)“1  X'V1  Y  -  B]  [(X'V^xf1  X'V1  Y  -  8]'} 

«  e{[(x,V1x)“1  X'V"1  (XB  +  e)  -  e] 

C(X'V”1X)”1  X'V"1  (XB  +  e)  -  0]*} 

«  e|[(X*V”1X)”'1  X'V"1  X6  +  (X'V^X)"1  X'V"1  e  -  0] 
[{X'V’^r1  X'V"1  XB  +  (X* V"1X)“1  X'V-1e  -  B]'l 

«  E-{c(X* V_1X)"'1  X'V’e]  [(X'V-V1  X'V^eJ'l 
*  E  [(X'V"1  X)"1  X'V"1  ee'V^X  (X'V^X)"1] 

-  c2  (X*1  VX'"1) 

or  finally  “  °2  (x'v  ^x)  * 

We  would  now  like  to  show  that 

'  c a2(X* V’1X)“1  =  a2  C(X'X)"1 

c  - 

or 

I  =  [(X'X)"1  X'VX(X'X)"1]  [(X,V'1X)3 

Under  the  new  procedure  the  matrix  X  is  determined  by  adding 

one  column  of  ones  and  one  row,  /t  0  0  ...  0,  in  that  order  to  the 

r2 

BIBO  design  where  t  =  y  -  b.  i.e. 


The  inverse  of  this  matrix  will  Le  given  as: 


.here  a  *  ^  and  b  .  ^  . 

The  matrix  V  will  be  as  follows: 


wnere  0  and  I  have  the  appropriate  dimension.  Writing  V  in  another  form, 

-  -  '  «  :  C  CC  '•!,  1  r 


The  inverse  of  V  is  given  as: 


-I-i-M. 

Using  this  notation,  the  maximum  likelihood  covariance,  (X'V^X)"1, 


becomes 


(X'VV1  =  [X'(I  -  l  H)x]_1 


(X'V^X)"1  =  (X'X  -  1  X’MX)"1 


The  least  squares  covariance  becomes 


(X'X)-1  X'VX(X'X)"1  =  (X'X)"1  X'(I  +  M)  X(X'X)"1 

=  (X'X)"1  (X'X)(X'X)"1  +  (X'Xj1  (X'MX) (X'X)"1 


and  finally, 


*  (X'X)"1  +  (X'X)"1  (X'MX)iX'X)"1 


To  prove  equality  we  now  must  show  that 

;  I  «  [(X'X)"1  +  (X'X)"1  X'f'X(X'X)"1]  [X'X  -  ^  X'MX] 

«  [(x'x)"1  (x'x)  -  ~  (x'x)"1  x'mx  +  (x'x)"1  x'mx(x'x) 

V 

(X'X)  -Jr  (X'X)"1  X  MX(X'X)"1  X'MX] 

c  '  i 1 

*  I  +  [-  ^  (X'X)"1  X'MX  +  (X'X)"1  X'MX  -  (X'X)"1  X' 

(X'X)"1  X'MX]. 

We  now  need  only  show  that  the  term  in  brackets  is  zero,  i.e. 

-  |  (X'X)"1  X'MX  +  (X'X)"1  X'MX  -  |  (X'X)-1  X'MX  (X'X)"1  X'MX 


To  simplify  this  expression,  we  obtain  X'MX  as: 


and  finally. 


i£l|dl  .  at(t-l)  ♦  MMl! 

m(mi  .  at(t-i)  tmp£ 

iMt-U  -  at(t-l) 


0 


(3) 


Reducing  these  expressions  we  get 

-  (t-1)  ,  (t-1)2  _-t+l+t2-t-t2  +  2t-l_0_„ 

t  1 '  “  t  t  t 

and 

at(t-l)  _  +  at(t-l)2  _  at(t-l)  -  at2(t-1)  *  at( t-1  )2  _ 

t  t  t 

_  at(t-l)  (1-t  +  t-1)  _  at ( t-1 )  (0)  _  0  _  „ 

•  ”  t  "  t  '  t  "  u  ' 

c  '  C  cr 

Therefore  expression  (3)  in  fact  reduces  to  the  zero  matrix  and  the 
theorem  is  proven. 

As  an  example  of  this  theorem  consider  the  BIED  given  by  v  =  4, 
b  =  6,  r  =  3,  k  =  2,  and  \  =  1.  For  this  example  the  design  matrix  X, 
the  transpose  of  this  matrix,  X',  and  the  matrices  V  and  V’^  are  given 
as  follows: 


/ 


Or  upon  multiplication. 


1111 

1  -I  ■  1  ■  1 

2  111 

T  f  f  z 

12  11 

F  I  F  F 

112  1 

F  F  T  F 

1112 
F  F  F  3 


(5) 


We  can  see  that  expression  (5)  is  identical  to  expression  (4). 
Therefore  for  the  situation  under  consideration,  the  covariance 
obtained  by  maximum  likelihood  method  is  the  same  as  that  obtained  by 
the  least  squares  method  even  when  X  is  not  square. 

Having  established  that  the  covariance  matrix  obtained  by 
either  the  least  squares  approach  or  the  maximum  likelihood  approach 
are  Identical  for  BIBD  designs  modified  to  be  weighing  designs  to 
produce  orthogonal  estimates,  we  can  proceed  to  compare  the  relative 
efficiencies  of  Banerjee’s  previous  method  with  the  new  method.  He 

C  Cl;.'  r  . 

would  like  to  determine  the  covariance  matrix,  Cov(B),  for  both 
Banerjee's  method  and  the  new  method.  Cov(s)  under  Banerjeefs  method 
was  given  in  Chapter  V  as 


for  those  BIBD's  that  could  be  modified,  i.e.  either  t  was  integer  or 
a  positive  value  for  n  was  obtainable. 

For  the  new  procedure  (adding  on  row,  Vt  0  0  ...  0)  the  Cov(i) 
as  given  by  the  method  of  maximum  likelihood  (which  was  previously  shown 
to  also  be  equal  to  that  derived  by  the  method  of  least  squares)  is: 

Cov(s)  =  (X'VW  (6) 

where  V-^  has  the  form 


Substituting  this  expression  into  equation  (6)  we  get 
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We  see  that 


X’MX 


(t-1) 

0 

0 

0 

7t 

o 

o 

1 

1 

• 

• 

•• 

X 

o 

1 

where  XQ  and  Xq  are  the  original  BIBD  and  transpose  of  the  original 
BIBO  respectively. 

Simplifying  v/e  obtain 


/t(t-l) 

0 

/t 

0 

0 

0 

1 

xo 
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t(t-l) 

X'MX  «  - 

0 


0 


Substituting  back  into  the  original  expression,  we  get 

-,-1  -1 


Cov(i)  » 


X'X  - 


"t-1 

0 

0 

0 

Writing  this  in  another  form  and  letting  Z  »  X'X,  we  obtain 

1 


Cov(fi)  » 

Using  the  formula. 


(X'X)  +  I 


pt-l) 

0 

L° 

0 

-J 

a2  =  [z  +  iwrV 


(Z  +  UVI)”1  =  Z'1  -  Z“1U(Im  +  WZ‘1U)‘1WZ’1 


we  have  for 


Cov(i)  =  (X'V’x)’1*4 


where  a  =  tt-  and  b  = 


r  -  X  “  m  ’ 


Substituting  this  matrix  for  [X'X] ”,  we  get  Cov(b)  = 
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CovG)  -  J  .at  b+a?t^)  ...  a2t(t-i) 

•  *  **•  • 

*  ::: 

-at  a2t(t-l)  ...  b+a2t(t-1) 
or  substftuting  for  a  and  b  we  get 


™S  ex,,ressi0"  represents  c»v(5,  for  the  situation  where  a  BIBD  Is 
modified  to  produce  orthogona,  estimates  in  weighing  designs  by  adding 

one  co Turn  of  ones  and  one  row  where  the  first  element  in  the  row  Is 
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/t  and  all  other  elements  are  zeroes.  -  '•  c. 

Since  the  diagonal  elements  of  the  covariance  matrix  represent 

the  variance  factors  we  have  for  Banerjee's  previous  method  the 

variances  of  the  estimates  for  £  additional  weighings  equal  to 

whereas  in  the  new  method  the  variance  for  the  estimates  for  only  2Q§ 

additional  weighing  (/t  0  0  ...  0)  is  given  by  *(£“??,  ,  Using 

tk  A 

these  values  we  obtain  the  relative  efficiency  (defined  as  the  ratio 
of  the  reciprocal  of  the  variancies)  as  follows: 

c 

1 

rk+x('t-1  Y~ 
tk  a 

Relative  Efficiency  *  - - 

r 

Ha¬ 

rk 

rk+x1t-T7 

c  .•  c  ’  c  '  .  < 

Using  this  definition,  the  relative  efficiencies  of  the  new 
method  as  compared  to  Banerjee's  method  for  those  BIBO  listed  in 
Fisher  and  Yates  Tables  [4]  for  which  t  is  an  integer  is  given  in 
Table  I.  For  those  BIBO  were  t  is  not  an  integer  the  relative 
efficiencies  are  presented  in  Table  II. 
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TABLE  I 

RELATIVE  EFFICIENCY  OF  THE  NEW  METHOD  AS  COMPARED  c 
TO  BANERJEE'S  METHOD  WHEN  T  IS  AN  INTEGER 


Reference 

Number 

*  r2 

t  -f-b 

V  Relative 

Efficiency 

3 

3 

0.8571 

5 

2 

0.9474 

7 

14 

0.6176 

13 

14 

0.7111 

17 

3 

0.9000 

18 

24 

0.5400 

20 

2 

0.9643 

21 

18 

0.6792 

24 

12 

0.8308 

25 

30 

0.5085 

29 

18 

0.74C3 

36 

22 

0.7237 

42 

15 

0.6316 

43 

6 

0.8276 

44 

3 

0.9231 

46 

4 

0.9000 

49 

3 

0.9600 

61 

7 

0.7447 

64 

5 

0.8596 

66 

14 

0.7636 

67 

12 

0.8167 

71 

20 

0.4412 

72 

6 

0.7500 

75 

5 

0.8182 

77 

10 

0.7692 

78 

3 

0.9375 

*.  81 

2 

0-9783 

82 

«  6 

0.9091 

83 

5 

0.9375 

85 

1 

0.9615 

t  indicates  the  number  of  additional  weighings  required  by 
Baneriee's  method.  The  new  method  requires  only  one  addi¬ 
tional  weighing. 


lot, 


RELATIVE  EFFICIENCY  OF  THE  NEW  METHOD  AS  COMPARED 
TO  BAHERJEE'S  METHOD  WHEN  t  IS  NOT  AN  INTEGER 


Reference 

r2 

* 

** 

•kirk 

Relative 

Number 

tsT*b 

n 

e 

(n+e) 

Efficiency 

'  1 

2.50 

1 

i 

2 

0.8333 

2 

1.50 

0 

l 

1 

0.9615 

6 

2.33 

1 

i 

2 

0.8753 

9 

1.33 

0 

i 

1 

0.9802 

11 

3.33 

0 

2 

2 

0.8207 

16 

2.25 

1 

i 

2 

0-9000 

19 

1.25 

0 

i 

1 

0.9878 

* 

n  denotes  the  number  of  additional  weighings  with  no  objects  on  the 
scale.  | 

! 

£  denotes  the  number  of  additional  weighings  with  all  the  objects  on 
the  scale.  \ 

! 

♦  i 

n+£  Indicate  the  total  number  of  additional  weighings  required  to 
provide  orthogonal  estimates  by  Banerjee's  method. 
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Tables  I  and  11  Indicate  the  foil  owl  ngjr 

* 

a.  Whenever  t  is  an  integer  Banerjee's  method  produces 

smaller  variances  than  the  new  method.  (In  all*. situations  in  Table  I 

’’ s 

the  relative  efficiency  was  less  than  1.0.)  -• 

^  1  * 

i  .  "  -  ■  c 

b.  Whenever  t  is  not  an  integer  and  £  along  with  a  positive 
n  could  be  found,  Banerjee's  method  again  provided  smaller  variances. 

4  *  1 

(In  all  situations  in  Table  II  the  relative  efficiencies  were  less 

.  .  .  .S  , 

than  1.0.)  .  .*•’  '  * 

•  * 

However,  whenever  Banerjee's  method  produces  $  negative  n»  e.g.  refer- 
ence  numbers  15,  22,  23,  26,  and  27  in  [4],  or 'Whenever  a  value  for 
Z  could  not  be  determined  e.g.  the  design  given  as  v  =  4,  b  =  4,  r  =  3 

k  *  3,  and  a  =  2,  the  new  method  is  the  only  available  method  for 

modifying  BIBD  to  provide  orthogonal  estimates  in  weighing  designs. 

It  is  of  interest  to  note  that  in  the  design  given  as  v  =  4, 

b  *  4,  r  *  3,  k  *  3,  and  A  *  2,  the  value  of  t  is  0.5.  For  these 

situations,  i.e.  whenever  t  is  less  than  1.0,  the  relative  efficiency, 
when  compared  to  Banerjee's  method,  is  greater  than  1.  That  is  for 
those  designs  when  t  is  less  than  1.0,  the  new  method  provides 
smaller  variances  than  Banerjee's  method. 

<r  c 

Before  we  compared  the  relative  efficiency  of  the  new  method 
to  Banerjee's  method  v/e  had  shown  that  the  covariance  matrix,  for  our 
estimation  procedure,  obtained  by  the  least  squares  approach  was 
identical  to  that  obtained  by  the  maximum  likelihood  approach.  Since 
these  covariance  matrices  for  the  least  squares  and  maximum  likelihood 


/ 


approach  are  the  same,  pne  may  wonder  if  the  estimators  of  6  for 
the  least  squares  and  maximum  likelihood  approach  are  identical ,  i.e. 

,  e  .  *  j?  c  <  '  c  ..  t  .  € 

does  (X'X)~^X‘Y  (least  squares  estimator)  =  (X'V"^X)"^X'V*^Y  (maximum 

*  e*  • 

likelihood  estimator)?'  This  will  be  shown  in  the  following  theorem. 

Theorem:  In  the  estimation  procedure  under  consideration ,  with 

%  * 

E(ee')  ■  Va2,  the  estimators t  obtained  by  the  least  squares  pro¬ 

cedure  is  identical  to  the  estimators  obtained  by  the  maximum  likeli- 

«  •> 

«  *»• 

hood  procedure .  v  *  \ 

•  . »  , 

Proof:  We  wish  to  show  jihat  (X* X)”1 X* Y  (the  least  squares  estimator 
of  S)  ■  (X'V”*X)”^X'y  **X2(  the  maximum  likelihood  estimator  of  e),  i.e. 

,■*  je»* 

we  wish  to  show  that,;;  **•»• 

*».  .v 

(X*^X*Y  *  (X'V^xrVv^Y, 


•{x4r1X’  «  (X’V^XJX'V'1 


The  method  of  proving  equality  is  to  show  that  the  left  and  right 
sides  of  expression  (1)  reduce  to  the  same  expression.  The  left  side 
of  expression  (1)  reduces  to 


(X'XfV  = 


0 


-  a  /t  1  1  ...  1 

0  0 
0  0 


0 


where  a  =  tJ*  c  =  t£p  and  x0  is  the  o1"* ginaT  BIBD.  On  multiplication 


f 

/ 


"TV 


we  get. 


(X’xfV  * 


\  T*ka 


-  a/F 


a/t 


i-k. 


i-k. 


r  of  these  terms*  b-r  of  these  terms* 
c 


a  ...  c 


-  a 


-  a 


r  of  these  terms*  b-r  of  these  terms* 
-  a  • •  •  •  3 


(2) 


The  arrangement  of  these  terms  is  determined  from  the  X'Q  matrix  i.e. 
for  every  0  and  1  in  all  rows  of  X*Q  there  will  respectively  be  (c-a) 
and  (-a)  in  all  rows  of  (2).  u 


Substituting  for  a  and  c  we  get 


ix'xrV 


/r 

t 


✓t 


ft 

w 


0  0 
r  of  these  terms 


r  -  X 
tkA 


r  -  A 
. tkA 


0  .  ...  0 

b-r  of  these  terms 

1  1 
"  TiT  •••  ’  tk 


r  of  these  terms  b-r  of  these  terms 


r  -  A 
tkA 


r  -  a 
tkA 


(3) 


The  right  side  of  expression  (1}  reduces  to 


■'110. 


The  arrangement  of  these  terms  is  again  determined  from  the  XI  matrix. 
Therefore  the  arrangement  of  these  terms  will  be  identical  touthose 
in  the  matrix  given  in  expression  (2). 

Substituting  for  e  and  d,  we  get 


He  see  that  expressions  (3)  and  (4)  are  identical,  therefore 
for  our  estimation  procedure  CX'X)“1X*  »  (X'V’^rVv"1  and  the  theorem 
is  proven. 
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As  an  example  of  the  above  theorem,  consider  the  BIBD  given 
*  4,  b  =  6,  r  ■  3,  k  =  2,  and  X  -  1. 

The  least  squares  estimator  of  e  is  given  as: 


l  *  (X*X)“1X,Y 


1 

1 

1 

1 

1 

✓3 

1 

1 

1 

1 

1 

1 

I 

'  F 

'F 

"  F 

'  F 

0 

1 

1 

1 

0 

0 

0 

1 

*  f 

1 

1 

0 

0 

0 

0 

1 

0 

0 

1 

1 

0 

1 

•  F 

0 

1 

1 

0 

0 

0 

0 

1 

0 

1 

0 

1 

1 

*  F 

1 

1 

0 

0 

0 

1 

0 

1 

1 

0 

0 

0 

1 

"  F 

0 

0 

0 

1 

1 

*F 

F 

0 

0 

0 

0 

0 

0 

vF 

1 

1 

1 

1 

1 

1 

■  F 

3 

3 

F  ‘ 

F  “ 

F“ 

F 

vF 

1 

1 

1 

1 

1 

1 

"  F 

F" 

F  ‘ 

F 

3 

F  ' 

F 

1 

1 

1 

1 

1 

1 

•  F“ 

F 

3  " 

F 

F  ' 

F 

F 

rF 

1 

1 

1 

1 

l 

1 

*  F" 

F  ’ 

F 

3  “ 

F 

F 

F 

The  maximum  likelihood  estimator  of  0  is  given  as: 


s  * 

\ 

\  V 


^ _ 
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8  «  (X*V~1X)“1X,V“1Y  =  " 


T-i-i-i-i  /j  mm  i  oQoooo 

/  :■  0°  SSS  0  100000 

■  |  }  {  SIS  °°>°°°° 

^  j_o  ooioitJ  g  ogioog 

112  11  t  -c  0  000010  r  - 

<•  O  3  F  6  0  000001  I Y 

1112  1 

F  F  F  3  F 

11112 

?  F  F  F  3 


/T  0  0  0  0  0  2 
./Mill'll 

7T  "JT  X  X  -  rr 


/J  1 
I  3 

/J  1 
?“F 

1 

zmr 


j  0  0  0  0  0  0 

0  1  0  0  0  0  0 
0  0  1  0  0  0  0 
0  0  0  1  0  0  0 
0000100 
0000010 
0000001 


1 

0 

c 

0 

0 

0 

0 

0 

ft 

1 

1 

1 

1 

1 

1 

F 

I 

3 

J  “ 

F  ’ 

F 

"  F 

S5 

1 

1 

1 

1 

1 

1 

F 

3  " 

F" 

6 

3 

J 

"  6 

/J 

1 

1 

1 

1 

1 

% 

F  _ 

F 

I  ‘ 

F 

1“ 

F 

/F 

/ 

/I 

1 

1 

1 

1 

1 

F  " 

F  ‘ 

F 

I  ‘ 

6 

I 
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We  sec  that  expressions  (1)  and  (2),  i.e.  the  least  squares 
and  maximum  likelihood  estimators  for  g,  are  identical. 
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COMPUTER  CONSTRUCTION  OF  BALANCED  INCOMPLETE  BLOCK  DESIGNS 

Malcolm  S.  Taylor 

US  Army  Ballistic  Research  Laboratories 
Aberdeen  Proving  Ground,  Maryland 

ABSTRACT.  Numerous  papers  on  construction  of  BIBD's  (Balanced  Incomplete 
Block  Designs)  appearing  in  the  literature  consist  of  derivations  of  a  set 
of  base  blocks  with  symmetrically  repeated  differences.  The  inherent 
properties  of  the  algebraic  or  geometric  structures  that  are  employed 
lead  to  sharply  constrained  values  of  the  design  parameters.  There  would 
appear,  therefore,  to  some  Interest  in  a  more  efficient  computational 
scheme  to  discriminate  readily  between  a  set  of  blocks  which  have  symmetri¬ 
cally  repeated  differences  and  those  which  do  not.  This  is  the  topic  of 
this  paper,  and  although  the  values  of  the  parameters  are  limited  in 
magnitude  by  computational  considerations,  no  restrictive  parameter  rela¬ 
tionships  are  involved,  and  a  number  of  new  design  configurations  are 
presented. 

INTRODUCTION .  A  BIBD  is  an  arrangement  of  j  distinct  elements  (varieties) 
into  b  sets  (blocks)  of  exactly  k  distinct  members,  each  element  occurs 
in  exactly  r  different  blocks,  and  every  pair  of  distinct  elements  occurs 
together  in  exactly  A  blocks.  The  parameters  u,  b,  r,  k,  A  which  charac¬ 
terize  a  BIBD  satisfy  the  fundamental  relations 

bk  -  ur  (1.1) 

and  .. 

;  •  :  r(k-l)  -A(u-l)  (1.2) 

For  example, 

I  0,  1,  3  ] 

(  1,  2,  4  ] 

C  2,  3,  5  ] 

[  3,  4,  6  ]  <  .  ‘  •  . 

(  4,  5,  0  ] 

[  5,  6,  1  )  t  c 

1  6,  0,  2  ] 

is  a  BIBD  with  parameters  u  *  b  *>  7,  r  -  k  ■  3,  A  ■  1. 

Notice  that  if  the  block  (  0,  1,  3  ]  was  specified,  the  entire  design  could 
be  generated  by  successively  adding  (rrodulo  u)  the  non-zero  residue  classes 
to  each  element  of  the  specified  (or  base)  block. 

Bose  [1]  presented  a  technique  for  generating  a  BIBD  directly  from  a  set  of 
blocks,  called  base  blocks,  when  a  sufficient  condition  known  as  symmetrically 
repeated  differences  is  satisfied.  Within  the  scope  of  this  investigation 
symmetrically  repeated  differences  means  that  the  totality  of  the  inner- 
block  differences  of  the  base  block  elements  modulo  u  results  in  the  occurence 
of  every  non-zero  element  exactly  A  times. 

Preceding  page  blank  -m- 
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The  fc(fc-l)  modular  differences  of  [  0,  1,  3  ]  are 

0-1  =  6  1-0  =  1  3-0  =  3 

0-3  =  4  1-3  =  5  3-1  =  2 

and  so  [  0,  1,  3  J  is  a  base  block  for  a  BIBD  with  A  *  1. 

Since  each  block  gives  rise  to  k(k-l)  modular  differences,  it  is 
desirable  to  discriminate  readily  between  a  set  of  blocks  which  have 
symmetrically  repeated  differences  and  those  which  do  not. 

One  can  characterize  a  block  of  a  BIBD  with  parameters  u ,  b,  rt  kt  X 
as  a  vector  of  dimension  v  with  elements  Q  or  1,  where  the  presence  of  a 
variety  i  is  indicated  by  a  1  in  the  f+lth  position,  and  zero  otherwise. 

Fdr  exanple,  thr  0,  3,  4,  7  ]  in  a  design  with  v  ■  9,  k  *  4  is 

uniquely  represent  jf  the  vector  x  =  (1,  0,  0,  1,  1,  0,  0,  1,  0). 

Notice  that  the  varieties  are  represented  by  the  residue  classes  modulo  v., 

A  NECESSARY  AND  SUFFICIENT  CONDITION  FOR  SYMMETRICALLY  REPEATED  DIFFERENCES 
If  we  consider  the  v  x  v  matrix 

c  c  c 

1  1  0  ...  0 

0  1  1  0  ...  0 
0011  ...  0 

...  1  1 

1  0  0  ...  1 


%  v 


fc' 


and  form  the  product  Mjx  =  bt  to  every  pair  of  elements  a^J  aj  in  the  block 
represented  by  x  whose  difference  a.  -a.  h  1  (mod  v)  there  occurs  a  2  in 

■*”  J  t 

the  resultant  vector  b.  Converse ley,  the  number  of  2's  in  the  vector  b  is 
precisely  the  number  of  differences  congruent  to  l(mod  u)  that  would  occur 
if  the  totality  of  differences  modulo  y  of  the  elements  of  the  block  were 
computed. 

Similarly  the  matrix 

0  1  0  ...  O' 

10  1  0  ...  0 

010  1  ...  (2.2) 

10...  1 

upon  multiplying  the  vector  x  will  cause  a  2  to  occur  in  the  resultant  b  for 
every  difference  of  elements  of  the  block  a.  a.  s  2  (mod  v) . 

It  is  now  apparent  how  the  construction  should  proceed.  We  form  successively 
the  products  M^x,  M^x,  M[v/2]  x>  if  exactly  \  twos  occur  in  b  at  each 
stage,  then  the  block  differences  are  symmetrically  repeated.  This  constitutes 
a  necessary  and  sufficient  condition  for  a  block  to  have  symmetrically  repeated 
differences. 

One  need  never  proceed  further  than  [ v/2 ]  steps,  since  the  existence  of  a 
pair  of  elements  a.,  a a .  -  a .  =  n(mod  m )  implies  a.  -  a.  s  m  -  n.  The 

t  J  t  J  J  1* 

generalization  from  a  single  block  to  a  set  of  blocks  consists  merely  of 

replacing  the  vector  x  in  the  product  Mx,  where  X  =  (x^  x£  ...,  X})  )  denotes  a 

...  o 

partitioned  matrix,  each  column  of  which  is  the  representative  of  a  block. 

We  proceed  to  summarize  this  observation  as 
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THEOREM  2.1  Consider  a  block  B. 


[o^,  ...,  ,  with  elements 

represented  by  the  congruence  classes  modulo  u,  and  its  representative  x.. 


A  necessary  and  sufficient  condition  that  B.  satisfy  the  property  of 

synmetrically  repeated  differences  is  that  the  products  M .  X.;  j  *» 

J  * 

1,  2,  ...,  [u/2]  each  contain  the  same  number,  X,  of  twos,  in  the  resultant 
vector. 


Generalizing  from  a  single  block  S.  to  a  set  of  blocks  {£.}  we  obtain 
the  inmediate 

CORROLLARY  2.1  A  necessary  and  sufficient  condition  that  a  set  of  blocks 
{fl^>  constitute  a  base  for  a  (y,  b,  r,  kt  X) -design  is  that  the  products 
Afj  X,  X,  ...,  m[v/2]  x  contain  exactly  X  twos  in  each  of  the  resultant 
matrices. 

Observe  that  the  matrices  My  •••>  M[v/2]  ma’y  be  obtained  from  Mi 
in  the  following  fashion:  row  i  of  matrix  M^,  2  <_m  <_  [y/2] ,  is  precisely 

the  vector  sum,  modulo  2,  of  rows  i,  i+l,  ...,  i+m-l  (mod  u)  of  matrix  My 

This  observation  is  quite  useful  when  implementing  this  technique  for 

automatic  computation,  since  matrices  My  My  ...,  can  ^  8®nerated 

internally. 

•  C  Ct  '  =  :  r  ...  =  C  V  ‘ 

c 

If  one  considers  a  block  as  a  set  of  k  beads  on  a  necklace  cf  y  positions 
we  have,  for  example,  for  the  block  [0,  1,  3]  the  following  representation: 


V 


> 


X 


1 


A 


Notice  that  in  computing  modular  differences  what  is  important  is  not  the 
labels  we  have  attached  to  the  varieties,  but  their  positions  relative  to  each 

other.  In  other  words,  blocks  [0,  1,  3]  =  [1,  2,  4]  =  ...  s  [6,  0,  2]]  are 

all  equivalent  since  each  gives  rise  to  the  same  set  of  modular  differences, 

and  as  such  each  could  serve  as  a  base  block  for  the  BIBD.  These  blocks  are 

all  members  of  the  same  equivalence  class,  or  orbit. 

Clearly,  when  constructing  designs  from  base  blocks,  we  want  to  consider 
as  candidates  only  blocks  in  distinct  orbits.  Toward  this  end  a  convenient 
way  to  characterize  an  orbit  is  to  notice  the  1-1  correspondence  between  an 
orbit  and  the  circular  partition  of  an  integer  u;  e.  g., 

[0,  1,  3]  v  1  ♦  2  +  4 

where  the  summands  are  simply  the  "distance"  between  adjacent  beads  (varieties) 
If  one  generates  distinct  circular  partitions,  a  task  to  which  the  computer  is 
well  suited,  we  are  equivalently  generating  representatives  of  distinct  equiva¬ 
lence  classes.  .....  «  \  .•  .  --  . -■* 

n  A  COMPUTATIONAL  ALGORITHM  c  '  ‘  ‘  • 

If  we  denote  the  ith  row  of  the  matrix  X  as  a.  =  (a.-.  a.„ . a.,  ). 

then  X  can  be  represented  as 

X  - 


I 


rrr 


^  *9 


In  this  compact  notation 


(3.1) 

where  a.  +  a.  represents  the  usual  component-wise  addition  of  the  row 
vectors  a.>  a.  .  Notice  thac  if  tlie  subscripts  i  +  rt  >  v  in  (3.1)  then 

X  XtYI 

i  +  n  =  i  +  n(mod  u) .  We  conpute  successively  M.X,  i  =  lt  2,  ...,  [u/2] 

and  terminate  the  procedure  as  soon  as  the  required  number,  of  twos  fail 

to  appear.  Otherwise,  completion  of  the  process,  indicated  here  by  i  taking 

its  maximum  value  [v/2]  without  rejection,  is  sufficient  to  establish  the 
blocks  represented  by  X  as  base  blocks,  generating  a  BIBD. 

Since  addition  can  be  performed  much  more  rapidly  than  multiplication 

by  the  computer,  in  practice  we  compute  M.X  additively  as  expressed  in  the 

right-hand  side  of  (3.1)  rather  than  performing  the  actual  matrix  multiplication 

indicated  in  Section  2.  By  a  judicious  selection  of  candidates  for  base  blocks, 

it  may  be  possible  to  determine  a  set  of  blocks  generating  a  (v,  b ,  r ,  k ,  X)  - 

design.  Some  solutions  for  BIBD’s  with  large  replicates  determined  in  this 

manner  are  presented  iw  Teblg.-L- - - - — - - - - 


M.  X 

t 


al  *  °Ul ' 


“t  *  aM 


%  *  aUv 
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TABLE  I 


■  Parameters 
I  (vt  b,  r,  k,  X) 

i _ _ 


Base  Blocks 


(17,  68,  16,  4,  3) 

(19,  57,  18,  6,  5) 

(11,  44,  20,  5,  8) 

(11,  55,  20,  4,  6) 

(17,  68,  20,  5,  5) 


[0,  1,  2,  4],  [0,  1,  7,  10],  [0,  2,  6,  11],  (0,  3,  7,  12] 

[0,  1,  2,  3,  5,  10],  [0,  1,  3,  7,  11,  14],  (0,  1,  S,  7,  11,  14] 

[0,  1,  2,  3,  5],  [0,  1,  2,  4,  7],  [0,  1,  3,  6,  7],  (0,  1,  4,  6,  8] 

[0,  1,  2,  3],  [0,  1,  3,  6],  [0,  1,  4,  7] ,[0,  1,  5,  7],[0,  2,  4.  7] 

[0,  1,  2,  3,  6],  [0,  1,  3,  8,  11],  [0,  1,  5,  9,  12], 

(0,  2,  6,  10,  12]. 
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ON  SPURIOUS  CORRELATIONS  FOR  PARTIALLY  RELATED  VARIATES 


Oskar  M.  Essenwanger 

Physical  Sciences  Directorate 
US  Army  Missile  Research,  Development 
and  Engineering  Laboratory 
US  Army  Missile  Command 
Redstone  Arsenal,  Alabama 


ABSTRACT.  The  spurious  correlation  coefficient  between  related  objects 
such  as  a(a  ±  b)  is  well  known,  although  sometimes  overlooked.  Mostly 
unnoticed  goes  the  spurious  correlation,  however,  when  only  a  subset  of 
the  material  variates  is  Identical. 

The  respective  formulae  for  the  spurious  correlation  coefficient  are 
being  developed  in  the  case  of  correlation  between  wind  profile 
characteristics  of  the  lower  tropospheric  layers  and  the  atmosphere  up 
to  25  km.  Significance  testing  of  the  (linear)  correlation  coefficient 
against  these  spurious  correlations  is  described  and  demonstrated  by  the 
wind  profile  analysis  for  four  typical  climatic  zones. 

Although  the  method  has  been  developed  primarily  for  the  analysis  of 
wind  profiles  and  its  physical  interpretation,  the  statistical  method¬ 
ology  has  general  validity. 

INTRODUCTION.  It  is  well  known  that  a  spurious  correlation  coefficient  is 
produced  when  the  correlated  variates  are  related  such  as  x.  ■  a  and 
Xj  ■  a  ±  b,  where  x^  and  Xj  represent  the  first  and  second  data  sets, 
respectively.  It  is  overlooked  sometimes  that  a  spurious  correlation 
also  appears  when  a  subset  of  the  data  is  related  from  which  x^  and 
are  formed.  We  may  call  this  case  a  "partial"  spurious  correlation. 

The  particular  instance  arises  in  the  correlation  between  characteristic 
coefficients  of  wind  profiles  from  overlapping  layers  when  x  is  taken, 
e.g.,7  for  the  surface  to  3  km  profile  and  x.  would  be  computed  for  the 
surface  to  25  km  layer.  Although  the  variates  may  apparently  not  be 
directly  related  in  the  sense  of  an  a(a  ±  b)  multiplication,  the  param¬ 
eters  are  based  on  material  data  of  which  one  part  is  a  subset  of  the 
total.  The  appropriate  equations  will  be  developed  in  the  following 
sections. 


It  should  be  pointed  out  that  the  partial  spurious  correlation  coefficient 
is  not  automatically  useless  for  practical  applications.  Its  merit 
depends  on  the  problem  to  be  solved.  When  significance  of  the  coefficient 
is  tested  against  the  hypothesis  of  zero  correlation  one  must  be  aware  that 
significance  may  be  caused  by  the  interrelation  between  the  two  variables 
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and  not  by  physical  cause.  When  the  Independent  additional  correlation 
Is  the  question,  the  test  basis  should  be  the  spurious  correlation 
coefficient  rather  than  the  zero  value.  This  nay  not  preclude  the 
utilization  of  the  spurious  relationship  for  inter-  or  extra-pol a tion, 
prediction,  etc.,  but  the  user  should  be  aware  that  probably  no  new 
information  is  gained  in  addition  to  the  one  already  available  from  the 
lower  layer.  This  can  be  checked  as  discussed  later. 


The  following  pages  have  been  reproduced  photographically  from  the 
author's  manuscript. 


C  c 


We  have  now  the  choice  of  notation  for  the  term  T  ▼  We  may 

v1 

•elect  either  one  of  the  subsequent  expressions  eqn.  3a  or  b. 


First  we  state 


V1 

which  provides 

y 

or 


VDy 


Vlnx/ny  +  V2 


(3») 

(2b) 

(2c) 

(3b) 


v  ■  V  V  +  V 

y  *2 

Then  we  define 

K 

*  vh/(  nv  ■  nx^ 
ht+i 11  y  * 

where  Vg  is  the  regular  average  of  the  top  layer.  Accordingly 

y"  Vl  +  (“y  -  nx)v2/ny  (2d) 

or 


,  -  Vl  +  Va 


(2c) 


We  have  denoted 


W1  "  nx/ny  and  W2  -  (“y  -  nx)/ny. 

The  two  forma  ( eqn.  3a  and  3b)  are  equivalent,  but  the  subsequent 

c  *  / 

development  has  been  written  for  the  Vg.  The  equations  can  be 

readily  brought  into  the  first  form  by  setting  Wg  »  1. 

The  definition  of  the  x  and  y  can  be  expanded  to  x  and  y,  the 
mean  values,  with 


or 


y  -  wlVl  ♦  w2v2 

y  -  Vj  +  Vg 


(Ua) 
( kb ) 
(he) 


> 
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As  customary,  the  linear  correlation  coefficient  can  be  written 


r  **  Cov  In  a  “  rA  n 
xy  xy  ux  y  klQ 

(5) 

with  r  i 

Covxy  -  |s(x  -  *)(y  -  9)J  /N  , 

(6a) 

<4  ■  [*«  -  «*] '» 

(6b) 

L  m 

(Ty  -  r^y  -  y)2  /N. 

(6c) 

Now 

4-  ^vi-  V2/N-o i-oJ# 

(6d) 

CTy  ■  E[w1(v1  -  Vj)  +  w2(v2  -  v2)J  /N 

*  W1CT1  *  2  WlW2rl2Crla2  +  *2a2 

(6e) 

where  r  is  the  true  correlation  between  v,  and  v 
12  12 

and  ,j2  the 

variance  of  the  layer  h^+^  through  hg,  namely  <7®^. 

NtoVxy  -  ^V1  *  Ti>  [^(vj  -  Vj)  +  w2(v2  -  v2) 

] 

c 

•  Vl  4  W2r12 

(6f) 

Finally,  the  linear  correlation  coefficient  becomes 

/  "  [W1CT1  +  VlVlS?]  7  W^o!  +  2  wiw2ri2 

°i°*  *  4^] 

5  '  “  ■  (5«) 

In  order  to  derive  the  spurious  correlation  we  assume  that  there 

is  no  correlation  between  v^  and  Vg,  i.e.,  r^2  ■  0. 

Then 

r.Pr  Vi/(uW +  44'1’’ 

(5b) 

is  the  spurious  correlation.  Since  the  weight  w^  and  standard 
deviation  ^  are  positive  this  coefficient  will  be  positive,  too. 
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Ic  may  be  confusing  to  think  in  the  first  moment  that  for  >  1 

we  have  the  second  form  of  the  spurious  coefficient,  and  the  expression  looks 

different.  It  should  be  noted,  however,  that  now  o?  =*  02  and  this 

*  V2  j 

compensating  change  makes  the  new  form  identical  with  the  first  i 

i 

expression. 

The  02  or  is  usually  not  known  when  the  correlation 
V2  V2 

of  A  and  B  is  calculated.  When  a  must  be  obtained  from  the  data 
o  o  v 

there  may  be  no  need  for  the  computation  of  the  spurious  correlation 
since  probably  the  coefficient  r^  can  be  calculated  at  the  same 

°V2  j 

! 

time.  When  data  sets  of  Aq  and  Bq  are  available  with  crA  »  03  an<* 

o  Bo 

r A  B  ’  t*ie  establishment  of  v2  ( or  V^)  and  subsequent  determination 
o  o 

of  r.  may  be  costly  or  time  consuming.  Then  the  examination 
o  2 

of  the  spurious  correlation  would  be  advantageous.  In  order  to  c  <• , 

replace  in  eqn.  5b,  ve  8°  back  to  eqn.  2.  After  some  arithmetic 
v2 

we  find 


v2<>=’  (<4  +  44  •  ®Va  *B  rA  B  >  -  44  (6g) 

do  O  OOOO  c 

The  second  form  appears  with  wg  ■  1,  from  which  we  can  deduct  that 
the  same  terms  replace  the  second  term  in  the  denominator  of  eqn.  5b» 
Consequently,  the  spurious  correlation  tnay  be  written  as 

rspl  "  VlaA  ^2w1ctA  +  o|  *  2wLaA  aB  rA  B  ^ 

r  O  O  O  OOOO 

It  should  be  pointed  out  that  r#  rAg  . 


(7) 


If  ve  take  the  equal  sign  It  can  be  shown  that 
2cr3  -  r*(2c2  +  1)  +  c2  *  0 
where  c  =  w 

This  equation  can  be  used  to  study  the  behavior  between  spurious 
correlation,  <ja  and  g^. 


We  find  the  two  solutions 


C!  **  r 


or 


r  “  wl°rA/oB 


and  cx  » -r/(  1  -  2^) 


(7«) 

(7b) 

(7c) 


The  latter  must  be  discarded  since  0  <  r  .  <  1.0  and  c,  >  0. 

spx  X 

Equation  (7a)  represents  the  maximum  value  the  spurious  cor¬ 
relation  could  assume  when  all  the  correlation  between  A  and  B 

o  o 

would  be  spurious.  Since  the  empirical  correlation  comprises 
the  spurious  and  the  added  part,  we  must  state 


A  B 
o  o 


r  +  r_. 
sp  I 


(7d) 


The  independent  contribution  in  the  empirical  coefficient  is  not 

known  a  priori.  It  should  be  added  that  eqn.  is  symbolic  and 

a  linear  addition  of  the  two  parts  is  not  applicable,  i.e.  the 

r_  is  not  identical  with  the  r.  (see  eqn.  8  and  9  later). 

oV2 

We  may  also  interpret  this  spurious  correlation  co¬ 
efficient  for  partially  related  variables  as  a  weighted  correlation. 
When  w^  =  the  weights  cancel  out  in  eqn.  5b  and  the  notation  of 
the  familiar  sourious  correlation  remains. 
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We  may  test  the  significance  of  the  difference  between  two 
correlation  coefficients  by  the  transformation  to  Fisher's  z 
(e.g.  see  Hald,  1952  Brownlee,  i960,  etc.).  We  find 

*r  •  0.5  •  6n  [(1  +  r)/(  1  -  r)  1  (8) 

and 

dz  -  z  -  z  (9) 

rl  r2 

The  z  is  approximately  Gaussian  distributed  and  the  standard 
error  of  z  can  be  written  as 

e2  ~(N  -  3)‘0,5  (10) 

The  null  hypothesis  that  two  correlation  coefficients 
came  from  the  same  population  can  be  tested  by 

eAz  "  Cl/(hl  '  3)  +  1/(n2  ‘  5)V0'5  (ID 

which  is  again  normally  distributed. 

In  our  case  n.  *»  n_  as  both  samples  have  the  same  number 

X  fc  c  c  c  "  0  '  • 

of  observations.  Hence  c 

edz  (N  -  3)"°'5  m  J2  ez  (11a) 

At  the  95#  level  of  significance  we  accept  the  null  hypothesis 

when  0 

|A*|  s  1 1.96  eAJ  (lib) 

Table  1  illustrates  the  testing  procedure  with  the  surface 
to  5  versus  surface  to  25  km  windspeed  profiles  and  the  3  km 
versus  25  km  system.  The  headings  should  be  self  explanatory. 


sr 


It  can  be  concluded  from  Table  1  that  In  the  tropics  close  to  the 
equator  (Albrook  9°*0  the  correlation  between  the  two  coefficients 
has  the  same  magnitude  as  the  spurious  correlation,  but  for  all 
other  regions  the  linear  relationship  goes  beyond  the  one  expected 
merely  by  a  spurious  relationship.  Since  the  expectation  in 
zero  correlation  is  the  mean  value,  we  could  interprete  the  tropic  region 
result  as  a  justification  that  the  most  likely  wlndspeed  profile 
above  the  lower  layer  in  the  tropical  region  is  the  mean  profile. 

When  the  (spurious)  correlation  coefficient  is  utilized  for 
extrapolation  from  the  lower  layer  to  the  25  km  altitude  then  only 
for  reason  of  continuity  of  the  speed  profile  at  the  top  of  the 

t  ;  '  c  .  E  -  c  ■  •  t  '  -  <  c  8  c 

lower  layer.  There  is  no  apparent  physical  cause  to  associate 
the  lower  and  upper  layer.  This  conclusion  agrees  with  the  present 
facts  about  the  general  circulation  in  the  tropical  zones. 

'  c  i 

o  c  ■'  ■  •  •  ■  r  i  C 

In  contrast  to  the  tropical  region  midlatitudes  and  sub¬ 
tropics  appear  with  one  closed  system  from  surface  to  the  upper 
layers.  The  upper  boundary  of  this  system  reaches  far  into  the 
stratosphere  but  cannot  be  exactly  determined  from  this  wind  profile 
analysis.  The  top  of  25  km  was  chosen  by  other  considerations  and 
should  not  be  taken  literally  for  the  dominance  of  the  wind  regime 
up  to  that  altitude.  The  question  about  the  upper  boundary  would 
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have  to  be  investigated  by  different  techniques.  From  other 
correlation  analysis  (see  Stewart  and  Essenwanger,  1968)  20  km 
appears  most  likely.  In  this  article  the  point  of  interest  is 
a  correlation  beyond  the  spurious  relationship. 

The  columns  next  to  the  95 1>  value  of  display  the 
correlation  ratio  T]  (see  Mills,  1955i  etc.)  which  includes  a 
non-linear  relationship.  As  illustrated  by  this  last  column 
in  the  parts  of  Table  1,  there  is  virtually  no  addition  to  the 
linear  correlation.  i 


■  a  tfliiTUfff  _ 

"9X7  CA 


2.  Correlation  Between  Slope  and  Means. 

In  the  development  of  the  characteristic  coefficient  cf  the 
wlndspeed  profile  for  the  layers  surface  to  3  km  and  surface  to 
5  km  the  slope  is  a  better  representation  than  the  mean. 

Hc.ice,  correlation  coefficients  between  A^  (slope)  of  the  lower 
and  BQ  (mean)  of  the  total  altitude  range  have  been  computed. 

Since  again  the  windspeed  in  the  lower  layers  is  the  same  for 
both  profiles  the  question  arises  how  much  of  it  is  spurious 
contribution. 

We  assume 
hl 

vh0lh)/n0  ”A1  (12> 


where  0^  is  a  linear  polynomial  term  and  n^  the  respective 
deviser.  The  y  remains  the  same  as  defined  by  eqn'.s  (2).  Let 

us  introduce  instead  of  v.  in  eqn  (2c)  the  A  ,  then  we 

•.  c  »  -  o 

can  write 


w,  A  +  w_v_ 
1  o  2  2 


(The  w^  and  wg  were  defined  previously.) 

C  t 

With  x  and  y  as  customary  we  derive 


"3* 


Z*A,  -  A,)J 


wfo^  +  2  w.w^r 
1  Ao 


?A1 


+  v?a2 


2  v„ 


(13) 


(l4a) 

(l^b) 
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*  V 


y 

« 


\ 


and  NCov  a  £(  -  Aq)  +  Wg(vg  -  Vg)] 


*y 

with  Cov  ■  w,  r 


xy  TAlW 


a*  Oa  *  wor 


CTi  <T„ 


V2uai 


(14c) 


Vo‘ 


Then  the 


We  denote  p  -  r .  ,  pp  “  r,  and  R 

1  o2  2  V2  1 

correlation  coefficient  follows  as 

r  »  [v-R-or»  +  W  p  a*  a  ■^aA^Wl°A  +  2wlW2pl°A  °V  +  *2av  ^  ^ 
10  1  v2  1  o  o  v2  2 

(15a) 

We  can  now  again  require  for  the  determination  of  the  spurious 
correlation  that  wind  profiles  in  the  lower  layer  are  independent 
from  the  upper  layer.  This  postulation  makes  and  Pg  zero. 

The  third  correlation  coefficient  R^  stems  from  the  lower  layer 
alone.  Thus  the  spurious  correlation  is 

r«p2  *  *1*1  /(”K  *  Vy  (15b> 

OO  d 

We  find  by  comparison  with  eqn.  (5b)  that  now  the  spurious 
correlation  comprises  the  same  terms  except  the  added  factor 

.  .  :  .  ,  ;  '  .  t  -  ,*  c  -  ‘ !  -■  t-‘.  <•  c  .  c  ■ 

rap2  "  rspl  *  *1  ^15c^ 

It  la  evident  that  the  spurious  correlation  is  produced  again 

r  t  <  '  '  c  .  .  t  s 

based  on  the  existence  of  the  variances  alone  as  in  r  , 

spl 

which  will  always  be  positive.  The  change  is  now 
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i  I 

/, 


;  /  ■ 
/ 


It  modifies  the  size 


the  multiplication  by  the  second  factor,  R^. 

and  determines  the  sign  of  this  spurious  coefficient.  One  could 

call  ?Sp2  a  spurious  correlation  coefficient  of  second  kind.  It 

depends  on  the  relationship  between  the  Aq  and  A^  coefficient  in 

the  lower  layer,  or  in  other  words  the  mean  and  slope  and  their 

association.  When  these  two  characteristics  are  independent 

the  is  zero  and  the  spurious  relationship,  too,  becomes  zero. 

Table  2  lists  the  actual  correlations,  the  spurious 

correlation  and  the  correlation  between  A  and  A,  of  the  lower 

o  1 

layer  ( Rj) .  The  checking  procedure  was  the  same  as  for  Table  1 
and  is  not  repeated  here. 

We  notice  that  the  spurious  correlation  in  all  cases  is 

virtually  zero.  The  remaining  correlation  should,  therefore, 

be  due  to  the  relationship  between  the  lower  layers  and  the  one 

above.  One  would  now  expect  that  the  correlation  coefficient 

between  A^  and  Bq  has  been  adjusted  by  excluding  the  spurious 

part  from  the  tA  and  thus  the  new  r.  „  would  include  this 
o  o  i  o 

correction.  This  leaves  the  optimum  correlation  between  lower 
and  upper  layer.  This  can  be  confirmed  by  Inspection  of  Table  2, 
especially  for  Albrook,  where  r  0. 


Table  2.  Correlations  Between  Slope  (A  )  of  Lower  Layer  Windspeed 

Profile  and  Mean  (B  )  for  Surface  to  25  km. 
o 

a)  5  km  versus  25  fan  system 


Vinter  Spring  Summer  Fall 


r 

rsp 

*1 

* 

r 

•P 

*1 

r 

rsp 

R1 

r 

rsp 

R1 

Albrook 

• 

o 

£■ 

* 

.006 

.110 

-.043* 

.003 

.060 

.230 

.017 

.380 

.102* 

.010 

.230 

Montg. 

.684 

.023 

.610 

.787 

.023 

.710 

.428 

.018 

.430 

.721 

.016 

.510 

Chat. 

•587 

.019 

.610 

.690 

.024 

.640 

.7^7 

.024 

.660 

.668 

.020 

.620 

Thule 

.321 

.026 

.740 

.484 

.030 

.700 

.607 

.024 

.470 

.516 

.023 

.560 

b)  3  fan  versus  25  fan  system 


r 

rsp 

*1 

r 

r 

sp 

*1 

r 

r 

sp 

R1 

r 

r 

sp 

*1 

Albrook 

* 

.000 

.003 

.120 

-.032* 

-.003 

-.110 

.136 

.004 

.180 

* 

.007 

.003 

.120 

Montg. 

.536 

.010 

.650 

-592 

.007 

.620 

.249 

.009 

.420 

.546 

.007 

.470 

Chat. 

.490 

.007 

.610 

.533 

.008 

.560 

.597 

.009 

.560 

.589 

.008 

.630 

Thule 

.268 

.006 

.430 

Ml 

.009 

.560 

.333 

.005 

.230 

.302 

.005 

.300 

*  c  C 

Not  significantly  different  from  spurious  correlation. 
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Since  Che  correlation  in  Che  ocher  regions  is  based  on  physical 
cause,  however,  the  reduction  is  not  very  large.  In  fact,  some 
-of  the  correlations  have  even  increased  slightly.  This  result 
is  not  contradictory  as  we  are  dealing  with  a  different  para- 
meter,  the  slope. 

Replacement  of  O2  from  eqn.  15h  follows  by  eqn.  6g  as 
v2 

discussed. 
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3.  CORRELATIONS  BETWEEN  SLOPES. 

In  this  phase  we  treat  the  problem  that  both  x  and  y 
represent  a  slope  of  wlndspeed  profiles.  The  determination  of 
the  spurious  correlation  becomes  more  difficult  than  in  the 
previous  cases . 


hi 


r 'h0ih/n«  ■  *i 

(l6a) 

fvih/n,  ■  Bi 

(16b) 

where  0^  and  are  linear  polynomials  of  range  h^  and  hg  with 

dividers  n^  <  n^.  We  split  y  again  into  a  lower  and  upper  layer 
part 

7  "  ( 

In  order  to  formulate  the  correlation  in  terms  of  lower  and  full 
layer  let  us  replace  the  v^  in  the  first  term  of  y  by  an  orthogonal 
polynomial  expression 

vh'V  Vih  +  <S*a*  -  (17> 


hl 

f  Vih  * 


£ 

h+i 


Vih’/nt 


(l«c) 


This  substitution  is  merely  a  representation  of  the  lower  layer 
wlndspeed  profile  by  polynomials  and  can  be  expanded  to  higher 
order  if  necessary.  Then 


y  «[r1(Ao  +  Al0lh  +  A202h)tlh/nl+  Vg 
hg 

with  the  abbreviation  V_  =  y  v,  *,.  / n 

2  hj+1  h  lh  * 


(l6d) 

(l6e) 


*  v 


- * 
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The  y  can  be  further  developed  into 


*  -  A.  fV%  +  \  +  A2p2t>*l»/”t  *  V2 


*  ’  *oAo  +  *1A1  +  *2*2  +-+  V2 


*o  ’  l  *lh/nt 

*1  '  fVlh'”, 
*2  ’  fShW”, 


(18c) 


where  the  summation  Is  carried  out  over  the  range  of  the  lower 
layer  only. 

By  analogy  we  derive  x  and  y  by  the  usual  operation . 


Finally  we  calculate 


I’4'**! 

=  *ao(VV  +  al(ArV  +  a2(VV  +--^2  '  V3* 


( 19*) 


°?  “  *Yao  +  *1^  +  a2crAg+"  *+  °V2  +  2aoalRlaAo0'A1+  2ao42  VA^A2+2aia2  Vax^+  * 

+  2aoPlCTAoCTV2  +  2alP2<7A10rV2  +  2a2P3C7A2aV2  ^  19b^ 

where  the  are  correlation  coefficients  between  lower  and  upper 
layer.  / 


The  covariance  becomes  r  t 

ncov  -  +  ^vv2  +  ^vvtv^ 

+...+  e(v2-v2)(a1-a1) 

*  <■  c 

^  "  ao\\\  +  *1^  +  *2*3^%  +4  4  4+  Pl\\ 

(19c) 

Again  ve  derive  Che  correlation  coefficient 

r  "  ( “i0^  +  *oh\\  +  *2Va1<ta2+*  * >+  \ay 

(20) 

In  the  spurious  correlation  coefficient  we  assume  that  no  relation¬ 
ship  between  lower  and  upper  layer  exists.  Hence,  »  Pg  **  =  0. 

rap3  'iSW*-U\  '  Oy 

(20a) 


with  o'  o  [a +  a^+  2‘o‘iVA^  2aoa2*2\\ 

4  !,*Vv*2  +-4  °v2]'  <20b) 


(20b) 


We  determine  oy^from 


•  <42  -  4  ("Wo  +  4^  *  ‘l°\  4 

4  4  2,iaa',A10AsV  •  2t*‘>VBirAoBi 


1AiBi¥i  2  ¥  Bi  Vi 


(20c) 


-  c2*.^  +  +  2*!°\ 4 

4  2a.a2t,*i)%Ra  4  2aia2(IA1%V 


+  *»VBiVx + 


( 20d) 
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When  no  correlation  between  the  coefficients  of  the  lower  layer 
exists,  i.e.  *  Rg  =  =  ...R^  «  0,  then 

rsp5  *  *laA 1/(ao°Ao  +  ^  +  al°Ag  +*”+  °V2)  (20e) 

or  with  Oy ^replaced 

r«p3  ■  *i°a1^2*Ioa)  +  2*i°*1  *  2*!°52  +  “Ij  -  ^vv’bJabj 

+  +  <20f) 

This  is  the  logically  expanded  form  of  r#  ^  and  r^g.  ®8U4Hy 
some  correlations  R^,  Rg  or  R^  are  not  zero,  and  the  computation 
must  be  based  on  (20a) . 

The  actual  r#^  can  again  assume  positive  and  negative  values 
depending  on  the  a^. 

The  analysis  results  of  the  surface  to  3  km  and  surface  to  ^  km 

c  c 

wind  profile  system  (slopes)  is  illustrated  in  Table  3.  This  table 
depicts  the  correlation  between  the  slopes  and  the  spurious  cor¬ 
relation  in  the  upper  part.  The  correlation  between  Aq,  Ag  of  the 
lower  layer  and  the  slope  of  the  entire  layer  up  to  5  km  i»  added 
in  the  middle  part,  while  the  lower  part  contains  the  inter- 
correlations  of  the  surface  to  )  km  layer. 

It  proves  again  that  the  spurious  correlation  is  close  to 
zero  in  the  subtropics,  midlatitudes  and  polar  regions,  while  it  is 
significantly  different  from  zero  in  the  tropical  region.  In  contrast 


T 


♦  \ , 


to  the  previous  results,  however,  the  actual  (linear)  correlation 
coefficient  displays  significant  difference  from  the  spurious 
correlation  at  the  95$  level.  Although  the  possibility  exists 
that  the  additional  information  between  spurious  and  actual  cor¬ 
relation  in  the  tropics  could  be  attributed  to  physical  cause,  the 
suspicion  of  identity  between  spurious  and  actual  correlation  remains. 
Two  factors  may  contribute  to  produce  significance. 

Since  N  >  1000  in  all  cases,  already  smaller  differences 
between  spurious  and  actual  correlations  render  significant 
dissimilarity.  Since  eqn.  ( 17)  is  an  approximation,  the  inclusion 
of  a  third  order  term  may  bring  both  coefficients  closer  together. 
Whether  the  actual  spurious  correlation  is  underestimated,  however, 
cannot  be  readily  predicted.  There  is  no  doubt  that  the  correlations 
in  the  other  climatic  regions  are  real. 

Again,  the  correlation  ratio  did  not  prove  of  any  practical 
value  beyond  the  linear  relationship  and  has  been  omitted  from 
publication. 


Table  3*  Correlation  Between  Slope  (A^)  of  3  to  Windspeed  Profile 
and  Slope  (B^)  of  5  to  Windspeed  Profile. 
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*P 

ap 

•P 
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VO 
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.78 
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.13 

•73  -.05 

.78 
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.75 

-.14 

Thule 
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o 

• 

1 
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.11 
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.09 

Correlations  Between  A  ,  A  and  B. 

o  2 


rA  B 
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Vl 
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Vi 
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Vl 
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Vl 
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Montg. 

.42 
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.53 
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.17 
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.31 

.37 

.28 

.33 

.24 

Thule 

.15 

.33 
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.28 
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.30 

.17 

.34 

Intercorrelations,  surface  to  3  to  Windspeed  Profiles, 
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CONCLUSION.  It  has  been  demonstrated  that  the  linear  correlation 
coefficient  may  be  spurious  when  the  data  are  only  partially  related. 

The  investigated  case  in  this  article  deals  with  the  particular  problem 
in  which  one  parameter  is  computed  from  a  subset  of  the  total  data. 

Three  cases  were  examined.  The  first  case  was  based  on  the  condition 
where  two  mean  values  are  computed,  and  one  mean  is  calculated  from  a 
subset  of  the  data.  As  the  example  for  windspeed  profiles  in  four 
different  climatic  regions  shows  (Table  1),  a  spurious  correlation 
different  from  zero  emerges  in  all  four  zones,  but  only  in  the  tropical 
zone  does  it  appear  that  the  actual  correlation  is  identical  with  the 
spurious  one  as  tested  at  the  952  level  of  significance. 

The  study  is  expanded  to  examine  the  spurious  correlation  between  slope 
in  the  lower  layer  and  mean  of  the  entire  altitude  range.  Tha  spurious 
correlation  must  be  modified  by  the  inclusion  of  a  correlation  term. 

The  example  for  windspeed  profiles  from  four  climatic  zones  displays  that 
this  time  the  spurious  correlation  is  approximately  zero.  Adjustments 
to  reflect  this  reduction  of  the  spurious  correlation  appear  in  the 
correlation  coefficients,  especially  in  the  tropical  zone. 

The  last  case  deals  with  the  problem  of  two  slopes.  The  spurious  corre¬ 
lation  assumes  a  more  intricate  form  containing  various  correlation  terms. 
The  empirical  example  for  windspeed  profiles  exhibits  spurious  correlation 
different  from  zero  only  in  the  tropical  zone.  Present  tests  indicate, 
however,  that  in  this  zone  spurious  and  actual  co* relation  coefficients 
are  different,  too,  as  tested  at  the  952  level  of  significance. 

The  spurious  correlation  has  been  largely  developed  in  this  study  to  check 
correlation  coefficients  between  characteristic  coefficients  of  windspeed 
profiles  which  were  established  in  the  analysis  of  wind  data  for  missile 
design  and  operational  purposes.  The  consequences  go  far  beyond  this 
limit,  however,  and  the  method  has  application  in  various  fields  of 
statistical  analysis.  One  of  the  main  goals  was  further  to  illustrate  in 
this  article  that  spurious  correlations  can  arise  in  cases  where  only 
parts  of  the  data  are  related.  It  is,  therefore,  advisable  to  examine  the 
parameters  to  be  correlated  for  any  source  of  possible  spurious  rela¬ 
tionship  before  conclusions  are  drawn  about  physical  causes  of  any 
existing  correlation. 
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THE  LEAST  SQUARES  ANALYSIS  OF  DATA  GENERATED  BY 
A  "PIECE-WISE"  GENERAL  LINEAR  MODEL 


Robert  L.  Launer 

Army  Procurement  Research  Office 
Fort  T^ee,  Virginia 


This  study  was  motivated  by  the  frequent  appearance  of  economic  data 
which  can  be  described  as  piece-wise  linear  with  certain  "end-point"  or 
"cross-sectional"  constraints.  (Figure  1  depicts  several  examples  of  this.) 
The  study  is  intended  mainly  for  the  field  analyst  who  is  confronted  with 
this  type  of  data  and  insufficient  time  to  work  out  more  than  the  barest 
details . 

All  of  the  models  in  figure  1  can  be  expressed  as  straight  lines 
within  each  of  several  Intervals  with  a  linear  constraint  on  the  parameters. 
For  example,  the  broken  line  can  be  represented  as  follows:  o 

(a^  +  bjX  ,  x  <  x* 

*  *  ...  ■ 

a2  +  b2x  ,  x  >  x*  , 

subject  to  the  constraint: 

a^  +  bjX*  +  d  *  a2  +  b2x*  . 

The  "bent  line"  model  is  just  the  "broken  line"  with  d  *  o,  and  the  third 

.  ,  •  c  ' 

example  in  figure  1  has  the  same  form  of  representation  except  that  the 
auxilliary  condition  is  just  b^  - 
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Examples  of  models  which  follow  this  broken  line  pattern,  but  which 
are  not  linear  within  each  Interval,  also  exist.  For  example,  a  corpora¬ 
tion's  stable  economic  growth  might  be  suddenly  Interrupted  by  an  external 
factor  such  as  a  war  or  merger,* after  which  Its  growth  Is  again  stable 
but  progressing  at  a  different  rate.  If  the  growth  equation  contains 
only  Une-r  parameters  and  the  exact  time  of  the  change  in  growth  is 
known,  then  this  model  can  be  analyzed  with  the  methods  outlined  in  this 

paper. 

The  theory  of  least  squares  and  regression  analysis  subject  to 
parametric  constraints  has  been  treated  extensively.  In  this  paper, 
the  theory  subject  to  linear  parametric  constraints  Is  presented  and 
cast  In  a  form  which  allows  easy  adaptation  to  "piece-wise'’  general 
linear  models.  A  test  for  linearity  of  data  is  proposed  and  finally, 
the  theory  Is  Illustrated  with  data  obtained  from  US  Army  cost-incentive 
contracts.  A  general  familiarity  with  the  theory  of  linear  hypotheses 
will  be  assumed. 

1.  Regression  Analysis  with  Linear  Constraints. 

Let  it  be  required  to  estimate  the  elements  ofe  *($j,S2  ,...,Sf>), 
from  the  n  observations  y/*(yj,  y£,...»yn)  which  are  generated  by  the 
general  linear  model  £=X  6+f.  If  e*(€0  ....  )  Is  the  error  vector 

and  the  matrix  X  Is  known,  of  dimension  nxp  and  of  rank  p<n,  then  the 
least  squares  estimate  of  b.  Is  £  ■  (xyx)”^x*£.  (1.1) 


A 


*9 


Suppose  now  that  the  elements  of  t  are  constrained  by  the  k<p 
known  linearly  Independent  relationships: 
t.  8j+t1282+...+tlp 

I  i  (1.2) 

tkl  B1+tk262+*“+tkpBf>“dk 


If  T«(t  )  and  d'-(d1td?t...,dk),  equations  (1.2)  may  be  written 
1j  ~  1 


(1.3) 


T  l-  d. 

Note  that  T  Is  of  full  rank  k<  p. 

In  order  to  minimize  (yj-Xsj '  (y-xs)  with  respect  to  e.  subject  to  the 
constraint  (1.2),  the  method  of  LaGrangian  multipliers  immediately  suggests 
Itself.  Let  .... It  is  necessary  then  to  find  the  extreme 

value  of  (y.-xs) 1  (^-x8)+2fTi_dJ  (1.4)  with  respect  to  £  and  X. 

Differentiation  of  (1.4)  with  respect  to  £  and  ^yields  the  “constrained" 


normal  equations: 

x'x£  +  7  T'  X  -  x'£ 

T  £«d 
% 

Solving  (1.5)  for£yields: 

8  -  (x'x)  *Tx'^  -  -  (x'x)"^'  * 

2 

where, 

7^  -CKx’x)'1!']  “1[T(x,x)’1x,x>  -  £] 


0.5) 


(1.6) 


(1.7) 
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.  t 


i 

j 


Thus  from  (1.1): 

<  6-2-  (x'xr1T,[T(x,x)'1T’]  [T  £-£)  (1.8) 

|  Notice  that  £  Is  a  linear  combination  of  the  components  of  £  and  therefore, 

i 

i  ^ 

Is  unbiased  for  (as  It  must  be.)  Furthermore,  £  Is  normally  distributed 
I  with 

j  j 

I  cov  (1)-  o2{(x'x)'1-(x,x)*1T,[T(x,x)’  tT1  T(x'x)"1  }  ,  (1.9) 

f 

and  the  rank  of  cov  (j)  Is  p  -  k  . 

! 

The  remaining  distributional  properties  of  j[  and  the  tests  of  linear 
j  hypotheses  are  not  difficult  to  establish.  First,  recall  that 

i  2 

j'  (*.-*£)'(£•«)  *  x  .  (i.io) 

n 

I  ‘ 

f 

i 

c  Furthermore,  If  M  Is  an  nxn  matrix,  then 

i  '  '  ‘  r  ' 

j  (y.-x6) '  (y-xe)»(y.-xB) *(I-M)(£-xb)+(£-xb)  ’M^-xb) .  (1.11) 

! 

|  Finally,  from  1.3  It  may  be  observed  that 

•  ,0  ,  ,  t  -  C 

(?-6)x‘ (y_-x6)*(T£-d) '  [T(x'x)  [T$-dJ  *  o  and  It  follows  that 

(l-xe) '  (^xb)-(^-xb)  '  (y.-xB)  +  (b-£^x'x(b-b).  (1.12) 

If 

M  ■  x(x'x)”^x'-x  (x'xJ’VtTlx'xJ'Vr'Ux'xJ'V  (1.13) 

It  also  follows  that 
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T 


(lri)'x'x  (£-i)  ■  (y-xe) 'M(y-xe) . 


0.14) 


Notice  that  M  Is  Idempotent  of  rank  p  -  k.  Therefore  (1.14)  Is  distributed 
«  ®  x  so  long  as  v>  N(Xs,  o2I).  Finally,  from  (1.12),  (1.14)  and 

p-  |C 

Cochrans  Theorem, the  distribution  of  the  residual  or  error  estimate  o2  is: 

n-(p-k).  (1.15) 

The  error  sum  of  squares(l .15)  Is  related  to  the  "unconstrained"  error 
sum  of  squares  which  can  be  shown  by  direct  substitution  of  (1.8)  Into  (1.15): 


(y-x£) '  (y-xe)  *  (y-x£) '  (y-xs) 

♦[T£-dJ  '  [T (x'x)*1T,]‘1[T£-dJ. 


This  formula  leads  to  computational  efficiencies  later. 

Tests  of  hypotheses  regarding  the  elements  of  B^must  be  conducted 

with  slightly  more  care  than  Is  usual.  Evidently,  any  test  of  hypothesis 

on  jb  subject  to  the  given  constraints  Tfif d  may  be  regarded  as  a  test  of 

hypothesis  on  a  second  set  of  constraints,  T  8  ■  d  ,  given  the  first  set. 

o~  “o 

Since  b_  contains  p  elements  and  the  rank  of  T  is  k<^p,  then  the 
test  of  hypothesis  may  be  expressed  as  no  more  than  p-k  linearly 
Independent  equations.  In  other  words,  the  rank  of  TQ  maybe  no  more 
than  p-k. 
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Suppose  that  the  matrix  TQ  referred  to  In  the  preceding  paragraph  is 
*  x  p  of  full  rank  m,  and  that  the  rovs  of  T0  are  linearly  independent 
of  the  rows  of  T.  Then  define  the  augmented  matrix  T*  of  dimension 
(k-Hn)xp  and  rank  (k-Hn)  and  the  augmented  vector  of  dimension  k+m 
as  follows: 


T*  « 

■»  * 

T 

•  •  • 

d*  ■ 

i  « 

d 

#  •  • 

To 
•  . 

6> 

»  « 

The  null  hypothesis,  Ho,  Is  T0  e.-d0  and  the  alternative  Is  T^d^. 
Then  the  sum  of  squares  due  to  £  subject  to  T*,  SS*  (b),  Is  given  by 
(1.13)  with  T  replaced  by  T*. 

SS*  (B)  «  (jt-xb)'  M*(£-xb)  c  (1.16) 

M^xfx'xrV  -xtx'xr.V'  CT*(x,x)“1T*']”V*(x,x)”1x'  (1.17) 

If  Ho  Is  true  then, 

SS*  (B)  *  a  2.x  2  (1.18) 

p-k-m 

C  '  c'  c  ■' 

Mote  that  M*  Is  idempotent  of  rank  p-k-m.  The  "corrected"  sum  of  squares 
(Grayblll)  to  test  Ho  Is  the  difference  (1.14)  and  (K16) 
(l-xB)1  (M-M*)  <*-xB)  (1.19) 

For  computational  purposes,  this  may  be  written  as 

' [T*(x'x)”1T*,3“1  [T*B-d*]  -  [Ts-dJ ' [TU'xrVl^pJ-d}  (1.20) 
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In  order  to  Investigate  the  properties  of  (1.20),  Introduce  the 
notation: 

ApTfx'xrV 


kffT0(x,x)^x' 

A^T^x'xJ'V- 


LM 


0*21) 


Then  the  sums  of  squares  (1.15)  (through  (1.13),  (1.14),  and  (1.11)) 

'  i 

and  (1.19)  can  be  written  respectively  as:  j 

tarKfiVO-xfx'xJ^x'-A’CAA' ]"*  (z-X£)  0.22) 

and 


(jr-x6l ' [A*'[A*A* ‘ ]”V-A*  [AA ' ]_1 A]  (y.-xB) 


0.23) 


Now,  notice  that  both  A'[AA‘]  ^A  and  A*[A*A*']”^A*  are  idempotent. 
Furthermore,  xfx'xj'^x'A'-A'  and  xtx'xJ'^x'A*'^*' .  Since  the  Inverse) 


of  [A*A*‘]  Is 


[AA'-AA  '(AoAo1)-^  ATl  -(M>)mlAAot[A0A0>-A0At(AAtr1AA0,l 


-n 


-[\tA0,]'1A0A'[AAl-AA0,(A0A0l)‘1A0AT1  [A^'-A^MTHT1 

then  A**CA*A*'3"1A*  Is 

1  1 
[A'-Aq' (A0A0* )”  A0A‘]  [AA'-AAg1 (AqAq ' )’^AqA']”  A+ 

+A0'[A0A6'-A0Ai(AA’)“1AA0’]-1[A0-A0A'(AA')“1A]  (1.25) 


0.24) 


4  > 


The  matrix  (1.24)  is  symmetric,  from  which  fact  one  can  show  that  each  of 
the  two  terms  of  (1.25)  Is  symmetric.  Therefore, 
A*,[A*A*,]‘1A*A'[AA,]’1A»A,[AA,]"1A  (1.26) 


-150- 


■  t  ..  '*V. 


subject  to  the  constraints: 

T2  £  -  d  (2.2) 


T  0  «  d 
r  —  —  r 


w 


Equation  (2.1)  may  be  written 
xf*l 

x/  <  x2  *x2* 

y-<  - 


Vi* 


subject  to  the  constraints: 

T  4  •  d . 


It  Is  clear  that  x,  x'x,  and  (x'x)  may  be  written  as  partitioned  matrices 

<  t  c  r  ' 

with  zero  In  every  non-diagonal  block. 


X- 


(2.3) 


L°  0  X<J 

and  similarly  for  x’x  and  (x'x) 


-1 
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•  t  7~  “-T; 


A  glance  at  equations  (1.8),  (1.16)  and  (1.20)  Indicate  that  the 
calculation  Involved  In  obtaining  't  and  the  F  statistic  are  simplified 
since  £^1s  a  linear  combination  of  the  l^and  £j  *  kj'xj)"*xj' 


3.  The  Broken  Line  Example. 

Suppose  It  Is  necessary  to  estimate  the  parameters  In  the  broken 
line  model 


fvVv*i> 

lyb2(xf-72) 


,  x^x*.  1-1 , 2,..., m^ 
,  x*<Xj,  1*m^+1,...,n 


II  ■  - 

x  ■  in,  2  and  x0  •  1  T 
1  1  x£x*  1  2  x. 


(3.1) 

(3.2) 


The  subscripts  1  and  2  refer  to  the  elements  of  the  model  and  their 

c  ’  c  L .  .  '  c 

estimates  which  He  to  the  left  and  right  hand  side,  respectively,  of  the 

discontinuity  point  x*.  x*  and  d  are  known,  and  m^n-iy 

Let 

C  r  -  c 
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and  similarly  for  X Then 


x« 


P4"l 

L°  ■  «J 


(3.3) 


T*[l ,x*x^ ,-l ,-(x*-x^)]  an(*  (3  4) 

“  ^ai  *b^ »a2*t>2],  so  that  the  broken  line  model  can  be  written  In 
matrlc  notation, 

X.a*J»  €  subject  to  Te»d. 

The  matrix  (x'x)  Is 
o  oo 


(x*x)« 


0 

0  0  HI  o 
2 


0  0  0  s 


and 


xx2J 

.  .-1  1  1  (x*-x, )2  1  {**J7  \ 

T[xx]  T  ‘  f^T"***  W 


-X  '2 
-=2 
'xx2 


The  results  of  section  1  give: 


I 


yi+(x*-*q>  J2l  -F2-(x*-7)  *  d 

sxxl  1 


xx2 


I  (x*-X.)2  I  (x*-x7)2 

’  ’Si  s„2 
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*  «, 


n  — 

1  I  ou| 


b.sv> 

Dirr  *  x  (x*-x,) 
7  5 - 


*xx1 


(3.6) 


xxl 


r*  — 


a2-y2  *  Slf 


xx2  2  Sxx2 


It  Is  Interesting  to  note  that.  If 


y^-aj+bj  (x^-x1 ) ,  y2*-a2+b2(x*-x2) 


then 


yi*+d-y2* 


Z  [Var(y1*)+Var(y2*)]y'J?' 


It  Is  Important  to  note,  that  If  d  Is  not  known,  then  (from  (1.4))  It 
follows  that  X  •  0  and  the  least  squares  estimate  of^ls  merely  £  *  T. 

This  means  that  the  constraint  Is  unknown  and  must  be  estimated  with  the 

unconstrained  estimate  f.  If  d  contains  at  least  one  known  element  and  / 

/ 

at  least  one  unknown  element  then  this  Is  not  always  true. 
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A  test  of  hypothesis  which  seems  potentially  very  useful  is  a  test  for 
“linearity"  of  data;  i.e.,  a  test  to  determine  whether  or  not  the  data 
Is  generated  by  a  single  straight  line  model  over  the  entire  range  of  the 
Independent  variable.  The  test  is  easily  conceptualized. 

First,  the  bent  line  model,  eq.  (3,1)  and  eq.  (3.2)  with  d«o,  is 
assumed.  The  "bend"  point  (x*^*)  is  then  one  point  In  common  to  the 
straight  line  segments  which  compose  the  model.  Then  the  hypothesis 
Is  either.  Hoi,  that  the  two  slopes  in  question  are  equal  or,  Ho2, 
that  a  point  different  from  the  bend  point  is  common  to  the  line  segments 
or  their  extensions. 

Formally,  the  two  hypothesis  are: 

Hoi:  b^*b2  , 

Ho2:  a^+b^(x°-7^)»a2+b2(X0-X2) 

-  The  statistics  for  both  tests  are  presented  here.  (Ho2  Involves  x°=o). 
Unfortunately,  both  tests  Involve  unwieldly  formulae  and  can  be  reconmended 
only  by  the  possible  savings  in  degrees  of  freedom.  The  F-statistics  are 
of  the  form  (1.27).  To  facilitate  writing,  let  B*(T£-d)^  [AA^]"*(TB-d) 
and  Cj«[T*£-d*]/  [A*A**]"^ [T*£-d*]  where  and  C2  refer  to  Hoi  and  Ho2, 
respectively,  and  let  SS£  refer  to  the  denominator  of  the  F-tes-t (1.27). 
Then, 


(3.7) 


(x*-Xj)  +  y|  ^xy2|2  (x* 

l  —  — 


>-x,)2 


(x*^)' 

$xxl 


+  (x“-x2)  2 
Sxx2 
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On  setting 


4.  A  Procurement  Example. 

By  law,  a  corporation  which  Is  desirous  of  selling  goods  or  services 
to  the  Department  of  Defense,  must  become  a  party  to  a  contract  with  the 
US  Government.  The  contract  Includes,  among  other  things,  an  agreed 
upon  price  (cost  plus  profit)  which  the  Government  Is  obviated  to  pay  for 
the  product.  When  thj  cost  of  satisfying  the  contract  is  uncertain  or 
technical  uncertainty  is  high,  an  incentive  feature  and  a  “target"  cost 
and  target  profit  are  Introduced.  If  the  contractor's  cost  is  lower 
than  the  target  cost,  then  a  (previously  established)  percentage  of  the 
savings  are  returned  to  the  contractor  as  an  Increased  profit.  If,  however, 
the  actual  cost  exceeds  the  target  cost,  then  a  percentage  of  this  cost 
growth  Is  subtracted  from  the  contractors'  profit. 

This  study  concerns  only  the  Cost  Plus  Incentive  Fee  (CPIF)  type 
contracts.  The  CPIF  contract  type  must  always  state  a  maximum  fee  and 
usually  a  minimum  fee  while  the  Government  pays  for  all  allowable  costs. 
Figure  2  Illustrates  the  relationship  between  cost  and  profit  for  this 
contract  type.  Notice  that  the  sum  of  cost  and  profit  is  called  the  price 
(or  total  price)  so  that  a  broken  line  relationship  also  exists  between 
price  and  cost. 

The  following  symbols  will  be  used  throughout: 

i  *  target  profit  r,  *  greatest  lower  bound  of  all  costs 

'  u  which  should  yield  profit  * 

w  »  maximum  profit  « 

C,  •  least  u^per  bound  r  f  all  costs 
„B  ■  minimum  profit  L  which  should  yieu!  profit  »  M 

C_  •  target  cost 


RELATIONSHIP  BETWEEN  COST  AND 
PROFIT  FOR  A  CP IF  CONTRACT 


Profit 


FIGURE  2 


The  datum  set  consists  of  29  randomly  selected  CPIF  contracts  with 
target  price  $375,000  or  more  which  were  deflnitized  after  1963  and 
completed  prior  to  September  1971.  The  data  were  normalized  to  unit  target 
cost  and  target  profit  and  unit  difference  between  the  maximum  profit  and 
target  profit  and  between  the  target  profit  and  minimum  profit.  The  costs 
were  similarly  transformed.  These  transformations  are  linear  or  strictly 
piecewise  linear  if  the  original  relationship  is  not  synmetrlc  with  respect 
to  the  point  (Cj,  *  j).  The  normalized  data  is  presented  in  Figure  3. 


-165- 


r 


Normalized 

Profit  NORMALIZED  SAMPLE  DATA 


0  1  2 


FIGURE  3 


Each  data  point  should  (theoretically)  lie  on  the  dashed  line,  but 
for  various  reasons,  variation  is  introduced  into  the  system.  If  the 
contract  incentive  feature  is  to  properly  motivate  the  contractor  then 
at  least  the  expected  normalized  profit  would  coincide  with  the  dashed 
line.  To  test  this  hypothesis,  the  piecewise  general  linear  model  proce¬ 
dure  outlined  in  the  preceding  sections  was  used. 

Notice  that  the  two  points  in  figure  3  which  lie  to  the  left  of  the 
origin  exhibit  no  variation.  There  Is  (apparently)  reason  to  believe  that 
this  will  always  be  the  case  for  points  to  the  left  of  the  vertical  axis 
and  to  the  right  of  the  vertical  line  at  In  figure  3. 

There  is  however,  considerable  variation  for  the  points  to  the  right. 
Expert  advise  could  not  resolve  this  issue.  Therefore,  the  two  leftmost 
points  were  discarded  in  this  analysis. 


. 

/ 

i 

'  -  .  •  . *9 _ 

1 

' 

‘  c  c  ?  ^ 

1  0 

i 

■  j 

i 

/%  ^  ^ 

The  data  yields  the  following:  a]"1.09,  bj»-.84,  a2*.35,  b2*.6. 


^1*1.08,  b,— .88,  a^.38,  b^«.42,  ^.13375.  Also:  Kx'xJ’V-l .445 
and  (^-d)'  (AA')"1  C^ard_)-.036  and 


T*(x'x)“*T*‘ 


[TMx'x}"1!*1] 


.-1 


1.45 

.29 

-.2 

1.352 

22.5 
-25.2 

22.6 

-14.6 

/ 


.29 

.26 

0 

0 


-25.2 

32.2 

-25.4 

16.4 


-.2 

0 

.2 

0 

22.6 

-25.4 

27.7 

-14.7 


1.35 

0 

0 

2.08 

-14.6 

16.4 

-14.7 

10.0 


Finally  [Pjs-d*]  »  [.189,  .12,  .35,  .6]  and  F3  24  »  3.68** 
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EXPERIMENTAL  ESTABLISHMENT  OF  ACCURACY  OF 
RANGE-TO- FUNCTION  MEASUREMENTS  FOR  ARTILLERY  PROJECTILES 


1LT  L.D.  Clements 
Data  Reduction  Section 
Yuma  Proving  Ground 
Yuma,  Arizona 


FIELD  MEASUREMENT  OF  AIRBURST  LOCATION.  An  accurate  means  of  deter¬ 
mining  the  range-to-function  (slant  range)  is  a  necessity  in  testing  of 
artillery  fuzing  mechanisms.  Precise  location  of  ground  impacts  is  not 
particularly  difficult,  but  exact  measurement  of  airbursts  is  often  a 
definite  problem.  Various  means  are  available  for  locating  airbursts 
directly.  The  most  common  is  the  use  of  observers  and  some  form  of  tran¬ 
sit  to  locate  the  smoke  signature  by  triargulation  (digital  transit  or 
cinetheodolite  being  the  most  accurate).  Another  direct  location  method 
which  has  been  proposed  is  the  use  of  acoustic  sensors  to  locate  a  point 
sound  source  but  the  reliability  of  the  acoustic  method  is  questionable. 

An  indirect  means  of  obtaining  the  slant  range  to  function  is  to  use 
the  time-velocity  record  from  a  Doppler  velocimeter  and  numerically  inte¬ 
grate  to  get  slant  range.  Although  the  numerical  techniques  involved  are 
explained  more  fully  in  Brittain  (1966)  and  in  Clements  (1973),  briefly 
the  process  is  this.  During  the  time  interval  when  the  Doppler  is  locked 
onto  the  round,  successive  radial  velocity  readings  are  averaged  and  multi¬ 
plied  by  the  time  interval  between  readings.  The  resulting  distances 
traveled  in  each  time  increment  are  summed  up  to  give  an  estimate  of  the 
distance  traveled  during  the  locked-on  period.  The  distance  the  shell 
traveled  before  the  Doppler  locked  on  is  estimated  using  the  muzzle  veloc¬ 
ity,  the  first  Doppler  measured  velocity,  and  the  time  interval  between 
tube  exit  and  lock-on.  Since  the  Doppler  break-track  coincides  with  shell 
function,  the  sum  of  the  distances  traveled  before  lock-on  and  from  lock- 
on  to  function  is  the  slant  range  to  function.  This  direct  numerical 
integration  is  quite  good  at  low  gun  elevations  and  a  mathematical  routine 
to  calculate  actual  shell  tangential  velocities  for  use  in  integration  has 
been  developed. 

At  Yuma  Proving  Ground  use  of  observers  is  the  most  common  means  of 
acquiring  slant  range  data,  with  use  of  the  Doppler  enjoying  an  increasing 
interest.  Unfortunately,  with  both  acquisition  methods,  the  precision  of 
measurement  is  known  hut  rhe  actual  accuracy  of  measurement  is  unknown. 

The  observers  only  occasionally  are  able  to  catch  the  flash  of  light 
accompanying  the  function,  and  more  generally  are  sighting  on  the  tell-tale 
puff  of  smoke.  The  relation  of  the  event  measured  to  the  actual  fuze 
function  is  not  known.  Similarly,  the  function  point  on  the  Doppler 
record  is  evidenced  by  a  relatively  sudden  loss  of  track.  Again,  the 
relation  of  this  break-track  point  to  the  actual  function  point  is 
unknown. 
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PHOTOGRAPHIC  AIRBURST  REGISTRATION.  In  order  to  obtain  an  accurate 
standard  against  which  the  other  slant  range  determinations  could  be  com¬ 
pared,  photographic  techniques  were  employed.  The  method  used  was  to 
emplace  a  bank  of  cameras  at  a  point  down  range  such  that  each  camera  could 
look  at  a  rectangular  window  along  the  line  of  fire  (see  Figure  1) .  The 
nominal  center  of  bursts  was  obtained  by  plotting  data  from  previous  tests 
and  locating  the  ground  range  where  most  of  the  functions  took  place. 

The  cameras  were  located  along  a  line  parallel  to  the  nominal  line  of 
fire,  at  a  distance  of  750  meters  from  the  line  of  fire.  A  bank  of  four 
cameras  was  used.  Each  camera,  a  Milliken  35mm  framing  camera  with  a  40 
inch  lens,  was  aimed  normal  to  the  line  of  fire.  The  cameras  were  spaced 
at  17  meter  intervals  to  insure  overlap  in  the  fields  of  view  and  the  aim- 
points  were  stairstepped  upwards  to  follow  the  trajectory.  The  "windows" 
described  by  the  cameras  appeared  as  shown  in  Figure  2.  Time  correlation 
among  the  several  acquisition  media  was  provided  by  the  Proving  Ground 
range  timing  facility. 

Actual  data  collection  was  extremely  simple.  As  indicated  above, 
the  cameras  were  prepositioned  based  upon  previous  experimental  data  so 
no  major  adjustments  were  possible.  The  round-by-round  collection  sequence 
consisted  only  of  listening  for  the  sound  of  the  firing  over  the  intercom 
system,  delaying  for  an  appropriate  time,  and  starting  the  cameras.  The 

cameras  were  allowed  to  run  for  two  to  three  seconds  after  the  sound  of 

the  burst  was  noted.  For  maximum  contrast  high  speed  color  film  was  used. 
Physical  operating  limitations  consisted  of  a  need  for  clear  skies,  prefer¬ 
ably  with  the  sun  in  a  position  to  provide  back  lighting,  and  minimal  winds. 

Data  from  the  cameras  were  reduced  by  locating  the  smoke  puff  on  a 
frame  and  backing  off  until  either  the  puff  was  no  longer  visible,  or  the 
flash  from  the  fuze  function  was  observed.  The  location  of  the  function 
point  relative  to  the  center  of  optics  was  calculated  by  ratio  and  pro¬ 
portion  (see  Figure  3).  Then,  knowing  the  location  of  the  burst  along  the 

nominal  line  of  fire  and  the  deflection  of  the  round  (from  observers),  the 

coordinates  of  the  burst  can  be  calculated.  Observed  slant  range  was  than 
calculated  from  the  burst  coordinates  and  gun  coordinates. 

DATA  ANALYSIS.  Sample  data  comparing  observer  and  camera  values  are 
given  in  Table  1.  "Extrapolated"  data  are  those  events  where  the  actual 
burst  was  not  within  the  view  field,  but  was  close  enough  to  be  estimated 
from  the  smoke  pattern.  The  mean  error  for  each  of  the  quantities  gives 
some  idea  of  the  overall  accuracy  of  the  measurement,  while  the  unbiased 
estimate  of  the  standard  deviation  can  be  taken  to  indicate  the  precision. 

Similar  statistics  for  the  Doppler  data  may  be  developed  to  provide 
some  basis  for  comparison  between  the  two  methods,  perhaps  through  an  F- 
test.  Also,  as  more  data  are  acquired,  the  mean  errors  and  estimated 
standard  deviations  may  be  refined  and  the  basis  fcr  comparison  between 
acquisition  methods  strengthened. 
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SUMMARY.  This  short  clinical  paper  is  intended  to  show  an  approach 
to  fulfilling  an  ever  present  need  in  testing  operations,  that  of  assigning 
reliable  accuracies  to  experimental  data.  The  example  cited  is  a  real  one. 
The  analysis  of  the  data,  though  simple,  allows  realistic  bounds  to  be 
placed  on  data  accuracy  requirements  and  comparison  of  redundant  data 
acquisition  methods  for  future  applications. 
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Gure  2:  Example  of  Camera  Windows 


AN  IMPROVED  METHOD  OF  ESTIMATING  THE  CRITICAL 
VELOCITY  OF  A  PROJECTILE  IN  PENETRATION  BALLISTICS 


G.  J.  McLaughlin 
DEFENCE  RESEARCH  ESTABLISHMENT 
VALC ARTIER,  CANADA 


ABSTRACT.  In  recent  years,  many  studies  have  been  done  on  the  rela¬ 
tive  merits  of  several  methods  of  fitting  the  logistic  and  specially  the 
normal  distribution  functions  as  dosage  response  curves.  Most  assessments 
have  been  made  only  for  sensitivity  experiments  where  the  stimulus  had  no 
random  fluctuations  around  the  chosen  test  levels.  The  purpose  of  this 
study  is  to  assess  the  relative  efficiency  of  some  of  the  methods  based  on 
the  'up  and  down'  sampling  technique  in  an  experiment  where  the  stimulus 
has  random  variations  around  some  fixed  levels.  One  of  those  methods  has 
been  found  more  efficient  than  the  one  currently  used  to  determine  the 
critical  velocity  of  a  projectile. 

NOTATION. 

V:  The  dosage  or  stimulus  in  general;  the  striking  velocity  in  the  case 

of  tests  to  determine  ballistic  limits, 
a:  A  parameter  measuring  the  spread  of  tolerances  in  the  response  curve, 

usually  called  standard  deviation. 

D:  Step  by  which  the  stimulus  (velocity)  is  increased  or  decreased  de¬ 

pending  on  whether  the  previous  trial  was  a  failure  or  a  success 
(in  units  of  a).  " 

K:  Error  of  estimation  for  the  starting  value  of  V  (in  units  of  o) . 

S:  Standard  deviation  of  the  stimulus  at  each  level  (in  units  of  o). 

Ns  Minimum  number  of  observations  required  for  one  determination  of 

the  502  point,  V50. 

NR:  Number  of  determinations  of  the  50%  point  for  a  given  set  of 

conditions.  ■  c 

R:  Allowable  spread  for  N/2  successes  and  N/2  failures  according  to 

Method  B  (in  units  of  a). 

RMS:  Root  mean  square  error  of  the  NR  determinations  of  the  50%  point. 

G:  A  random  value  from  a  normal  distribution  with  mean  0  and  variance  1. 

U:  A  random  value  from  a  uniform  distribution  between  0  and  1, 

Nq:  Average  number  of  observations  used  to  approach  V50  in  Method  A,  but 

not  included  in  sample  of  N. 

1.0  INTRODUCTION.  In  the  last  twenty  years,  there  has  been  much  discus¬ 
sion  of  the  relative  merits  of  several  methods  of  fitting  the  normal  inte¬ 
gral  or  the  logistic  integral  response  curves  to  sensitivity  data,  i.e,  to 
data  obtained  from  experiments  in  which  an  increasing  proportion  of  items 
either  fail,  explode  or  die  as  the  severity  of  the  test  is  increased.  In 
such  an  experiment  the  severity  of  test  which  would  barely  produce  a  failure 
cannot  be  measured  exactly;  one  can  only  observe  whether  an  applied  severity 

Preceding  page  blank 
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produces  a  failure  or  not.  Sensitivity  data  involve  responses  which  can  be 
either  positive  or  negative,  and  which  are  observed  at  different  levels  of 
some  variable  of  interest.  The  response  is  said  to  be  "quantal"  because  it 
is  measured  "ot  in  terms  of  a  continuous  scale  such  as  weight  or  length  but 
in  terms  of  the  observed  proportion  that  is  positive.  The  main  purpose  in 
analyzing  sensitivity  data  is  to  estimate  the  50%  point,  i.e.  the  level  of 
the  variable  for  which  the  positive  and  negative  responses  are  equally 
likely. 

In  some  methods  of  sensitivity  testing,  like  the  Probit,  Normit  or  Logit 
methods,  the  experimenter  chooses  in  advance  the  stimulus  levels  to  be 
applied  and  the  number  of  observations  at  each  level.  In  other  methods, 
like  those  based  on  the  'Up  and  Down ' sampling  technique,  the  choices  are 
made  sequentially,  as  the  experiment  progresses.  Up  to  now,  most  methods 
have  been  applied  and  their  efficiency  tested  only  in  the  case  where  the 
stimulus  levals  are  free  of  random  errors.  The  object  of  this  study  is  to 
assess  the  relative  efficiency  of  the  various  methods  of  sensitivity  testing 
when  the  dosage  or  stimulus  is  subject  to  random  errors  around  the  levels 
chosen. 

After  a  brief  description  of  the  general  sensitivity  problem,  two 
methods  of  estimating  the  50%  point  of  the  stimulus  variables  are  described 
together  with  a  Monte  Carlo  simulation  procedure  used  to  assess  their  rela¬ 
tive  accuracy.  The  two  methods  described  here  use  the  'Up  and  Down'  sampling 
technique  to  gather  the  data,  but  they  involve  different  estimation  proce¬ 
dures  of  the  50%  point. 


2.0  THE  PROBLEM  OF  SENSITIVITY  TESTING.  The  cumulative  normal  distri¬ 
bution  function  has  been  used  extensively  in  bioassay  and  in  other  sensiti¬ 
vity  experiments  because  the  probability  of  some  all-or-none  response  is  a 
monotonic  non-decreasing  function  of  a  quantity  V  which  measures  the  potency 
of  the  agent  producing  the  response.  The  occurrence  or  non-occurrence  of 
the  response  in  a  particular  individual  depends  on  whether  or  not  the  dose 
exceeds  the  tolerance  value  for  that  individual.  Individual  tolerances  are 
assumed  to  have  a  normal  frequency  distribution  in  the  population.  There¬ 
fore  the  probability  that  a  subject  chosen  at  random  from  the  population 
will  respond  to  a  dose  V  is  given  by 


P 


(2ir)_1/2  exp  (—  t 2 /2 )  dt. 


(1) 


where  W  -  (V-V50 )/a,  V50  is  the  value  of  V  corresponding  to  P  =.5  and  a 
is  the  standard  deviation  of  tolerances  in  the  response  function.  In  a 
sensitivity  experiment,  a  two-category  response  is  observed  to  determine 
the  effect  of  different  levels  of  the  dose  or  stimulus  V.  Each  experiment 
has  the  goal  of  estimating  the  value  of  the  variable  for  which  the  two 
responses  occur  with  equal  probability,  i.e.  the  V50  which  is  also  the  mean 
in  a  normal  distribution.  The  estimation  of  the  50%  point  is  more  desirable 
than  that  of  other  percentage  points  for  two  main  reasons.  It  can  be  more 
accurately  determined  with  a  reasonable  number  of  observations.  Furthermore, 
it  provides  the  most  satisfactory  basis  for  comparison,  because  for  two 
distribution  curves  with  the  same  mean  but  with  different  values  of  a,  the 
only  percentage  point  in  common  is  the  50%  one. 


3.0  PURPOSE  OF  THE  STUDY.  In  recent  years,  many  studies  have  been  done 
on  the  relative  merits  of  several  methods  of  fitting  the  logistic  and  espec¬ 
ially  the  normal  distribution  functions  as  dosage  response  curves.  The 
first  standaru  techniques  of  a  .alysis  were  those  called  Logit  or  Probit 
methods  depending  on  whether  the  assumed  response  curve  was  the  logistic  or 
the  normal  distribution  functions.  They  are  fully  described  in  References 
1  and  2  respectively.  Several  alternative  methods  for  estimating  the  50% 
point  were  later  suggested,  either  because  they  involve  less  computation  or 
because  their  validity  may  depend  to  a  lesser  extent  on  the  choice  of  response 
curve.  Most  of  these  recent  methods  are  based  on  the  *Up  and  Down’  sampling 
technique.  They  are  purely  arithmetical  processes  that  use  the  observed  res¬ 
ponses  independently  of  what  the  true  functional  form  of  the  response  curve, 

P,  may  be.  Their  merits  for  any  given  set  of  data  depend  upon  the  particular 
form  of  P  that  applies.  These  alternative  methods  have  been  described  and 
evaluated  in  References  3  to  6.  Unfortunately  all  assessments  have  been  made 
only  for  sensitivity  experiments  where  the  stimulus  had  no  random  fluctuations 
around  the  chosen  test  levels. 

The  purpose  of  this  study  is  to  generalize  two  of  the  methods  mentioned 
previously  and  to  assess  their  efficiency  in  an  experiment  where  the  stimulus 
has  random  variations  around  some  fixed  levels.  This  is  the  case  in  tests  to 
determine  the  critical  velocity  of  a  projectile  to  defeat  a  target  since 
sampling  variations  in  velocity  occur  for  a  fixed  weight  of  propellant.  Two 
methods  to  determine  the  critical  velocity  of  a  projectile  or  equivalently 
the  ballistic  limits  of  its  corresponding  armoured  target  are  described  herein 
and  their  efficiency  is  assessed  for  various  combinations  of  the  parameters 
involved. 

As  far  as  the  authors  are  aware.  Reference  7  is  the  only  existing  study 
on  the  efficiency  of  sensitivity  testing  methods  for  obtaining  the  critical 
velocity.  However  most  of  the  methods  suggested  in  it  are  a  subset  of  Method 
B  given  in  the  present  study  and  were  evaluated  for  some  particular  cases 
only. 


4.0  THE  METHODS  OF  SENSITIVITY  TESTING  USED  FOR  DETERMINING  CRITICAL 
VELOCITIES _ _ 

4.1  General.  In  experiments  to  estimate  the  sensitivity  of  armour 
plate  to  projectile  velocity,  a  common  procedure  is  to  fire  a  given  type  of 
projectile  at  various  velocities  against  a  given  armour  plate.  Obviously, 
there  are  velocities  at  which  some  projectiles  will  perforate  the  armour  and 
others  will  not.  It  is  assumed  that  those  which  do  not  defeat  the  plate 
would  do  no  were  the  projectiles  fired  with  a  sufficiently  larger  velocity. 

It  is  therefore  assumed  that  there  is  a  critical  velocity,  V50,  over  which  a 
success  (defeat  of  the  plate)  is  more  likely  and  under  which  a  failure  is 
more  likely.  This  critical  velocity  is  the  velocity  corresponding  to  50% 
successes  and  50%  failures.  On  account  of  the  symmetry  of  the  normal  distri¬ 
bution,  the  median  velocity,  V50,  is  the  same  as  the  mean  of  the  normal  inte¬ 
gral  response  function  of  Section  2.  In  the  case  of  tests  to  determine  the 
critical  velocity,  the  parameter  o  in  the  response  function  measures  the 
spread  of  tolerances  of  a  type  of  armour  with  respect  to  the  striking  velo¬ 
cities  of  a  given  type  of  projectile.  It  should  not  be  confused  with  the 
parameter  S  which  measures  the  spread  in  striking  velocity  corresponding 
to  a  fixed  weight  of  propellant. 
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It  is  assumed  that  the  probability  of  response  (defeat  of  target),  P, 
to  the  stimulus  (striking  velocity),  V,  is  given  by  an  integrated  normal 
curve  with  parameters  V50  and  c.  According  to  Reference  7,  this  assumpt:on 
is  supported  by  results  of  tests  involving  the  firing  of  a  considerable 
number  of  rounds  at  one  target. 

4.2  DESCRIPTION'  OF  THE  METHODS.  The  purpose  of  the  two  methods  studied 
here  is  to  estimate  V50  under  the  assumption  that  the  normal  response  func¬ 
tion  is  valid. 

For  the  two  methods,  an  attempt  is  made,  by  adjusting  the  weight  of  pro¬ 
pellant,  to  fire  the  first  round  at  a  velocity  which  is  the  best  estimate  of 
the  V50  known  to  the  experimenter,  say  VT^.  Each  subsequent  round  is  fired 
according  to  the  'Up  and  Down'  firing  technique,  increasing  the  velocity  by  D 
(in  units  of  o)  for  any  round  following  a  failure,  and  decreasing  it  by  D  for 
any  round  following  a  success.  The  series  of  velocities  used  in  this  'Up  and 
Down'  experiment  form  a  stochastic  process,  whose  main  feature  is  that  the 
velocities  tend  to  have  a  distribution  concentrated  around  the  V50.  Method  A 
uses  the  fact  that  an  initial  run  of  responses  of  the  same  sign  is  an  indi¬ 
cation  that  the  first  velocity  was  badly  chosen.  If  the  initial  run  of  con¬ 
stant  sign  contains  N0+l  rounds,  another  N-l  rounds  are  fired.  In  this  case, 
the  estimator  is 


VA 


.j_  ,N»r 

N+1  1-V1 


V.  +  V, 


N.  +N 
o 


D] 


(2) 


where  the  sign  associated  with  D  is  positive  if  the  last  round  was  a  fail¬ 
ure  and  negative  if  it  was  a  success.  Method  B  requires  that  firing 

should  continue  until  N/2  successes  and  N/2  failures  are  achieved  within  a 
range  of  velocities  of  R  units.  The  estimator  VB  suggested  by  this  method 
is  the  arithmetic  mean  of  the  N  velocities  corresponding  to  the  N/2  successes 
ana  N/2  failures. 


4.3  SIMULATION  OF  THE  METHODS.  No  attempt  was  made  to  assess  the  rela¬ 
tive  merits  of  those  methods  from  actual  firing  data  because  such  a  procedure 
would  have  involved  a  tremendous  number  of  rounds  besides  loosing  its  general¬ 
ity  through  its  association  with  specific  weapons.  The  error  inherent  in 
each  method  of  estimating  the  50%  point  was  evaluated  using  Monte  Carlo  tech¬ 
niques.  A  program  was  written  to  simulate  on  a  computer  the  complete  firing 
procedure  for  the  two  estimation  methods. 

Without  any  loss  of  generality  for  ti.e  present  study,  thp  variables  were 
scaled  in  such  a  way  that  V50  =  0  and  o  =>  1  in  the  response  function.  The 
velocity  of  the  jbh  round  for  the  two  methods  can  therefore  be  simulated 
using  the  following  equation: 

5 

V.  *  K  +  G  S  +  D  Z  C  for  j=l,2, . . . (N+N  )  (3) 

2  2  i=l  1  0 

where  K  =  error  of  estimation  for  the  starting  value  of  V 


•  l  .  _ 
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«  a  random  value  from  a  normal  distribution  with  zero  mean  and 
unit  standard  deviation 
-  0  for  j  *  1 

-1  for  and  j  »  2,...(N+Nq) 

+1  for  0  x  >  P  x  and  j  -  2,...(N+No) 

*  a  uniformly  distributed  random  real  number  between  0  and  1 

*  value  of  response  function  when  V  = 


Each  group  of  (N+NQ)  velocities  thus  calculated  yields  one  estimate  of 
V50  which  is  obtained  by  averaging  according  to  the  appropriate  formula  of 
Section  4.2.  This  process  is  repeated  as  many  times  as  required  to  allow 
the  computation  of  a  RMS  error  for  each  method  and  for  each  combination  of 
parameters.  A  complete  description  of  the  simulation  programs  is  given  in 
Appendix  A  of  Reference  8. 


4.4  RANGE  OF  PARAMETERS  AND  TYPE  OF  RESPONSE  FUNCTION.  A  RMS  error 
based  on  3500  determinations  of  the  50%  point  has  been  computed  for  each 
method  and  for  all  conbinations  of  the  following  values  of  the  parameters: 
D**.5,  1.,  2.,  S«0,  .5,  1.,  K=0,  1,2,4.  The  values  of  N  were  6,  9,  12,  15 
for  method  A.  In  the  case  of  method  B,  which  is  based  on  the  first  X  suc¬ 
cesses  and  X  failures  within  a  velocity  spread  of  R,  the  RMS  error  has 
been  evaluated  for  the  following  pairs  of  (X,R)  values:  (2,2),  (3,2),  (5,2), 
(2,3),  (3,3),  and  (5,3). 

The  most  frequently  used  functional  forms  for  the  probability  of  res¬ 
ponse  to  a  stimulus  are  the  normal  and  the  logistic  distribution  functions. 
Since  the  curves  corresponding  to  those  functions  are  almost  identical,  only 
the  normal  integral  has  been  used  as  response  function  throughout  this  study. 

The  RMS  error  calculated  with  the  simulation  program  has  been  expressed 
in  units  of  a  as  were  D,  S,  K,  and  R.  The  RMS  error  of  each  method  is  given 
in  the  first  two  Tables  for  the  combinations  of  parameters  mentioned  pre¬ 
viously. 

The  RMS  errors  of  method  B  for  D=l,  (X,R)=>(5,2)  and  (3,3)  check  with 
those  of  Figure  10,  Reference  7,  when  interpolating  over  S  in  Table  Bl. 


5.0  DATA  REDUCTION  AND  ANALYSIS. 

5.1  WEIGHTING  OVER  THE  PARAMETERS.  Since  the  purpose  of  this  study  is 
to  find  the  best  method  of  sensitivity  testing  for  critical  velocity  deter¬ 
mination,  the  RMS  errors  have  been  averaged  over  the  various  parameters  using 
the  weighting  system  which  appeared  the  most  realistic  in  critical  velocity 
estimation  problems. 

The  right  portion  of  Tables  A1  and  Bl  gives  for  each  method  the  RMS 
error  averaged  over  K  according  to  normal  distributions  with  a  common  mean 
i0»0  but  different  standard  deviations  0^=1. 5,  2.5  and  4.0.  This  means  that 
the  initial  velocity  estimate  is  assumed  to  follow  a  normal  distribution 
centered  on  the  true  V50  with  standard  deviations  of  1.5,  2.5  and  4.0  times 
the  basic  parameter  a.  The  value  of  0^*2. 5  is  believed  to  be  more  appropri¬ 
ate  for  most  applications,  unless  a  preliminary  "feeler"  round  is  fired  to 
improve  the  accuracy  of  the  initial  estimate  of  V50,  in  which  ca3e  the  value 
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ojf*l. 5  appears  more  realistic.  This  is  in  agreement  with  the  set  of  starting 
velocities  used  in  Reference  7,  which  corresponds  to  values  of  or  ranging 
between  1  and  2  in  units  of  a,  with  one  "feeler"  round  to  improve  the  starting 
velocity. 

The  average  number  of  observations  used  in  approaching  V50  for  Method  A, 
but  not  included  in  the  sample  of  N,  is  designated  N0  and  given  in  Table  Al. 

The  average  sample  size,  N+N0,  on  which  each  RMS  error  of  Method  B  is 
based,  is  given  in  Table  Bl. 

The  best  distribution  for  S  to  cover  the  range  of  weapons  is  not  known, 
but  since  it  is  definitively  heavily  concentrated  around  0.5,  a  triangular 
distribution  between  0  and  1  and  centered  at  0.5  has  been  assumed  realistic 
for  S.  Therefore  the  RMS  values  of  Tables  Al  and  Bl  already  weighted  over 
K  according  to  normal  distributions  with  0^=1. 5, 2. 5  and  4.0  have  been  aver¬ 
aged  over  S  according  to  the  set  of  weights  corresponding  to  a  triangular 
distribution  of  S  between  0  and  1  with  center  at  0.5,  that  is  0.125,  0.75, 
0.125,  for  S=0,  0.5,  1.,  respectively.  The  resulting  RMS  errors  weighted 
over  K  and  S  are  given  in  Tables  A2  and  B2. 

5.2  SELECTION  OF  THE  OPTIMUM  STEP  LEVEL.  On  account  of  the  variable 
sample  size,  it  is  not  obvious  from  Tables  A2  and  B2  which  level  of  D  is 
optimum  for  the  two  methods.  A  graphical  comparison  of  the  RMS  error  asso¬ 
ciated  with  each  level  of  D  is  made  in  Figures  1  to  6.  The  curves  indicate 
that  the  optimum  level  of  D  is  1  for  Method  A  and  Method  B,  whether  og  is 
2.5  or  4.0.  From  Figures  1  and  2  of  Reference  8,  this  is  also  true  for 
Method  A  when  is  1.5,  but  not  for  Method  B  where  D=0.5  is  better.  There¬ 
fore  a  value  of  one  c  for  D  should  be  aimed  at,  since  it  is  associated  with 
a  greater  accuracy  for  both  methods  over  the  values  of  ok  likely  to  be 

met  in  practice. 

5.3  COMPARISON  OF  THE  METHODS.  The  curves  plotted  in  Figure  7  indicate 
clearly  that  Method  A  is  superior  to  Method  B  with  R  equals  to  either  2  or 

3  when  0^=2. 5  and  the  step  level  D  is  one.  The  same  conclusion  can  be  drawn 
from  Figures  8  and  9  for  og=4.0  and  1.5  respectively.  Therefore  Method  A 
is  definitively  more  accurate  than  Method  B  for  0^=1. 5,  2.5  and  4.0  when 
the  step  level  D  is  at  its  optimum  value  of  one  o. 

In  a  critical  velocity  test,  unfortunately,  the  parameter  o  is  not 
known  in  advance.  It  is  therefore  necessary  to  use  for  o  a  reasonable  esti¬ 
mate,  ft  based  upon  experience.  If  no  such  estimate  is  available,  the  value 
of  50  ft/s  which  is  recommended  in  Reference  7,  appears  to  be  realistic. 

It  is  assumed  here  that  o  has  a  normal  distribution  with  mean  a  and  standard 

A 

deviation  o/4.  Therefore,  the  step  D  being  taken  equal  to  a  it  also  has  a 
normal  distribution  with  mean  a  and  standard  deviation  a/4.  The  RMS  error 
values  of  Tables  A2  and  B2  were  averaged  over  D  using  the  set  of  weighting 
factors  corresponding  to  this  distribution,  and  the  resulting  RMS  values 
are  given  in  Tables  A3  and  B3.  The  weights  were  .16,  .82  and  .02  for  D  equal 
to  0.5,  1.0  and  2,0  respectively.  The  confidence  that  the  V50  estimate  is 
between  the  true  V25  and  V75  is  also  given  in  Tables  A3  and  B3,  for  Methods 
A  and  B  respectively.  The  RMS  error  and  the  confidence  in  the  V50  estimate 
are  plotted  in  Figure  10,  for  both  methods  when  is  2.5  and  D  is  normally 
distributed  with  mean  1  and  standard  deviation  .25.  The  curves  on  Figure  10 
illustrate  again  the  superiority  of  Method  A  over  Method  B. 


5.4  ACCURACY  vs  SAMPLE  SIZE  FOR  METHOD  A.  The  authors  are  not  aware 
of  any  agreed  level  of  accura r  required  from  a  sensitivity  testing  method. 
However,  a  level  of  accuracy  i  that  one  is  90%  confident  that  a  V50  esti¬ 
mate  is  between  V25  and  V75  is  considered  desirable  and  realistic.  Assuming 
that  the  error,  K,  in  the  initial  value  of  V  is  normally  distributed  with 
mean  0  and  standard  deviation  aj^-2.5  (an  acceptable  assumption  when  no  pre¬ 
liminary  "feeler"  round  is  used),  a  confidence  level  of  90%  would  require 
a  total  sample  size  of  16.48  on  the  average  with  Method  A.  This  total  sam¬ 
ple  is  made  up  of  N0+l  ■  2.48,  which  is  the  average  length  of  the  initial 
run  of  identical  responses,  and  N-l  »  14,  which  is  the  fixed  number  of  obser¬ 
vations  after  the  run  of  N0+l.  In  this  case  the  average  number  of  observa¬ 
tions  used  in  approaching  V50,  but  not  included  in  the  sample  of  N,  is  N0  - 
1.48  and  the  subsequent  sample  on  which  V50  is  based  has  size  N  »  15.  Such 
a  sample  would  yield  a  confidence  of  90.9%  according  to  Table  A3. 

An  interesting  feature  of  Method  A  is  that  it  ignores  any  initial  run 
of  identical  responses  (an  indication  that  the  initial  V  was  badly  chosen) 
and  therefore  produces  an  estimate  of  V50  which  has  a  guaranteed  accuracy 
independent  of  the  error  K  in  the  initial  V.  Of  course,  the  greater  K  or 
or  for  a  fixed  N,  the  longer  the  initial  run  of  rejected  values  N0  and 
therefore  the  greater  the  total  sample  size,  N+N0,  required  to  achieve  a 
given  accuracy. 

6.0  CONCLUSIONS  AND  RECOMMENDATIONS 

6.1  CONCLUSIONS.  Two  methods  A  and  B  to  evaluate  the  50%  point  in  a 
sensitivity  test  when  the  stimulus  has  random  variations  have  been  assessed 
by  Monte  Carlo  simulation,  and  Method  A  has  been  found  superior  to  the  other 
over  a  realistic  range  of  error  in  the  starting  value  of  the  stimulus.  It 
is  more  accurate  and  therefore  more  economical  than  Method  B  which  is  cur¬ 
rently  used  in  critical  velocity  determination. 

The  optimum  step  level  D  by  which  the  stimulus  is  increased  or  decreased 
was  determined  to  be  around  la.  However  the  accuracy  provided  by  Method  A  is 
not  too  sensitive  to  variations  up  to  50%  in  the  size  of  this  step  level. 
Therefore  the  performance  of  the  estimate  of  V50  provided  by  Method  A  is  not 
sensitive  to  errors  in  the  guessed  value  of  a. 

Method  A  requires  on  the  average  16.48  observations  to  insure  a  90% 
confidence  that  the  estimate  of  V50  lies  between  V25  and  V75.  This  number 
is  made  up  of  an  average  of  1.48  observations  that  are  rejected,  followed 
by  a  sequence  of  observations  with  a  predetermined  length  of  15. 

6.2  RECOMMENDATIONS .  It  Is  recommended  that  Method  A  be  used  to  evaluate 
the  critical  velocity  required  from  a  given  projectile  to  defeat  a  target, 
since  it  is  more  accurate  than  Method  B  and  also  can  be  handled  more  quickly 
and  more  easily  than  Method  B  during  a  field  trial  to  determine  the  V50.  In 
an  experiment  using  Method  A,  the  steps  are: 

a)  Select  from  past  experience  an  estimate,  a,  of  the  parameter  a  in 
the  response  function.  Otherwise,  use  d  equal  to  50  ft/s  as  an  esti¬ 
mate  since  the  procedure  requires  that  a  be  known  within  rough  limits. 

b)  Choose  N  in  advance.  A  value  of  N=15  will  yield  a  90.9%  confidence 
that  the  V50  estimate  is  within  V25  and  V75. 


c)  Fire  the  first  shot  at  a  velocity  as  close  as  possible  to  an  initial 
guess  of  V50. 

d)  Carry  out  a  series  of  trials,  increasing  the  velocity  by  a  ft/s  fol¬ 
lowing  a  failure  and  decreasing  it  by  3  ft/s  following  a  success. 
This  is  done  by  regulating  carefully  the  weight  of  propellant  for 
each  step. 

e)  Continue  firing  until  the  chosen  nominal  sample  size  N  is  reached. 

If  N0+l  responses  are  alike  at  the  beginning,  the  total  number  of 
trials  is  NQ+N. 

f)  Use  as  an  estimate  of  V50  the  average, 


VA 


1 

N+l 


N0+N 
[  I  V 
i-No+1 


i 


+  V  ± 
N  +N 
o 


where  the  plus  sign  is  associated  with  a  failure  in  the  last  trial  and  the 
minus  one  with  a  success. 
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EVALUATING  AND  SCHEDULING  PROTOTYPE 
REQUIREMENTS  FOR  SUITABILITY  TESTING 


Major  Richard  B.  Cole  &  Major  William  J.  Owen 
U,  S.  Army  Infantry  Board 
Fort  Benning,  Georgia 


ABSTRACT .  This  paper  addresses  the  problem  of  developing  a  schedule 
for  suitability  testing  of  the  prototype  of  a  complex  item.  The  sequential 
approach  discussed  involves  ordering  the  requirements  against  which  the 
prototype  is  to  be  evaluated  and  then  using  this  ordered  set  of  require¬ 
ments  as  a  basis  for  sequencing  the  subtests  included  in  the  suitability 
test.  Emphasis  is  placed  on  developing  the  ordered  set  of  requirements. 

A  model  based  on  the  criteria  recommended  by  Fishburn  and  by  Moore  and 
Baker  is  developed  for  mapping  the  requirements  from  a  randomly  arranged 
set  to  an  unconstrained  test  sequence.  A  linear  model  is  developed  which 
is  determined  to  be  an  acceptable  normative  model.  Based  or.  the  results 
of  this  model  a  method  is  proposed  for  partitioning  requirements  into  sub¬ 
tests  and  for  sequencing  subtests  into  a  constrained  test  schedule. 

PURPOSE.  The  purpose  of  this  presentation  is  to  develop  a  method  use¬ 
ful  in  scheduling  a  suitability  test  for  the  prototype  of  a  complex  item. 

A  suitability  test  is  a  test  designed  to  evaluate  the  prototype  in  order 
to  determine  if  the  item  represented  by  the  prototype  is  suitable  for  pro¬ 
duction.  The  overall  test  of  the  prototype  will  usually  involve  evaluating 
the  prototype  against  many,  often  related,  requirements.  Generally,  a 
separate  test  is  required  for  the  evaluation  of  one  or  more  related  require¬ 
ments.  Consequently,  the  suitability  test  will  actually  consist  of  a 
series  of  individual  tests,  or  subtests.  The  specific  problem  to  be  addressed 
is  to  determine  a  method  of  scheduling  the  subtests  to  maximize  the  rate  in 
which  information,  relative  to  the  potential  suitability  of  the  item,  is 
generated  during  the  suitability  test. 

BACKGROUND .  In  the  development  of  a  complex  item  of  equipment,  It  is 
common  for  the  equipment  to  undergo  a  research  and  development  (R&D)  cycle 
of  several  years  in  length  and  to  incur  R&D  costs  of  several  million  dollars. 
One  of  the  last  phases  of  the  R&D  cycle  is  the  development  and  test  of  a 
prototype  of  the  item.  The  actual  test  of  the  prototype  can  be  quite  expen¬ 
sive  and  time-consuming  and  can  directly  affect  the  final  cost  and  final 
availability  date  of  the  end  item.  Consequently,  if  this  phase  of  the 
cycle  could  moat  efficiently  serve  its  purpose,  then  an  important  portion 
of  the  cost  and  developmental  time  of  the  end  item  could  be  minimized. 

The  purpose  of  the  suitability  test  of  a  prototype  is  to  provide  in¬ 
formation  upon  which  a  decision  of  item  disposition  can  be  made.  The  deci¬ 
sion  usually  will  be  to  determine  whether  the  item  represented  by  the  proto¬ 
type  should  be  accepted  and  placed  into  production,  accepted  contingent 
upon  certain  modifications,  retained  for  further  development,  or  rejected 
from  further  consideration.  This  decision  may  have  to  be  made  prior  to  the 
completion  of  the  suitability  test;  thus  it  is  essential  to  maximize  the 
flow  of  information. 


Preceding  page  blank 


-203- 


The  overall  suitability  test  will  consist  of  a  series  of  intermed¬ 
iate  tests  each  designed  to  evaluate  the  prototype  against  one  or  more 
specifications  or  operational  requirements.  Each  of  the  subtests  derives 
specific  information  about  the  prototype.  The  information  accumulated 
from  all  subtests  then  serves  as  a  basis  for  the  decision  relating  to  the 
final  disposition  of  the  item. 

The  time  required  to  make  the  decision  on  equipment  disposition 
directly  relates  to  the  time  required  to  accumulate  sufficient  informa¬ 
tion  upon  which  the  decision  can  be  based.  Consequently,  it  is  desirable 
that  the  subtests  he  scheduled  so  as  to  maximize  the  rate  of  information 
generated.  This  is  obviously  a  particularly  important  criterion  in  the 
scheduling  of  prototypes  of  items  required  for  an  immediate  need.  On  the 
other  hand,  care  must  be  exercised  so  as  to  prevent  a  premature  decision 
on  item  disposition.  Obviously,  an  incorrect  decision  could  result  in 
accepting  an  expensive  but  unsatisfactory  piece  of  equipment,  or  it  could 
result  in  delaying  the  production  of  a  suitable  item. 

The  problem  of  developing  a  test  schedule  which  will  maximize  the  rate 
of  information  generated  is  compounded  and  made  more  important  by  the  fact 
that  there  is  frequently  no  predetermined  stopping  rule  upon  which  the 
decision  on  item  disposition  can  be  made.  For  example,  it  may  be  undesirable 
to  decide  before  the  test  that  if  a  certain  per  cent  of  the  operational  re¬ 
quirements  are  not  met,  then  the  testing  will  stop  and  the  item  will  be 
rejected.  This  type  of  stopping  rule  may  be  unsatisfactory  since  the  per¬ 
formance  of  the  prototype  against  ocher  requirements  may  be  so  outstanding 
as  to  overshadow  its  failures,  or  the  degree  of  failure  may  be  more  impor¬ 
tant  than  the  failure  itself. 

CONCEPT.  There  are  multiple  factors  relating  to  a  suitability  test 
which  influence  the  desired  sequencing  of  its  subtests.  These  factors  must, 
of  course,  relate  to  the  amount  of  potential  information  which  could  be 
gained  from  executing  the  subtest.  An  illustrative  factor  pertaining  to 
the  amount  of  information  is  the  importance  of  the  requirements  tested. 

For  example,  the  information  gained  from  evaluating  the  prototype  against 
an  essential  requirement  would  contribute  more  information  upon  which  to 
base  the  decision  of  item  disposition  than  would  evaluating  against  a 
relatively  minor  equirement.  However,  there  may  be  several  factors  which 
warrant  consideration.  In  the  tests  considered  in  this  research,  five 
factors  were  identified  as  influencing  the  desired  relative  placement  in 
the  testing  sequence,  and  these  factors  were  found  to  be  of  varying  degrees 
of  relative  importance. 

In  addition  to  the  factors,  the  degree  to  which  each  factor  would 
apply  to  each  requirement  must  be  considered.  The  possible  degrees  of  appli¬ 
cability  of  a  factor  to  the  requirements  in  the  suitability  test  are  defined 
as  the  categories  of  the  factor.  In  this  research,  methods  were  developed 
for  identifying,  weighting  as  to  relative  importance,  and  categorizing  each 
factor  applicable  to  the  suitability  test  of  a  prototype.  The  factors, 
factor  weights,  and  factor  categories  are  considered  to  be  suitability 
test  dependent. 


-204- 


Once  the  factors,  factor  weights,  and  factor  categories  have  been 
determined  for  a  prototype,  each  requirement  is  described  in  terms  of 
the  degree  to  which  the  factors  apply  to  the  requirement  and  to  the  sub¬ 
test  needed  for  the  evaluation. 

Obviously  there  are  trade-offs  to  be  made  between  the  desire  to  place 
essential  requirements  early  in  the  te3t  sequence  and  the  desire  to  main¬ 
tain  the  prototype  in  testable  condition.  These  trade-offs  become  unmanage¬ 
able  when  several  factors  of  several  categories  each  must  be  considered. 

A  model  is  developed  for  mapping  the  requirements  from  a  randomly 
ordered  collection  of  requirements  to  an  ordered  set  of  requirements.  This 
model  maps  the  requirement  against  which  the  prototype  should  first  be 
evaluated  to  the  first  place  in  the  ordered  set. 

When  the  requirements  are  ordered,  they  are  in  the  proper  sequence  for 
unconstrained  testing  of  requirements.  However,  in  developing  the  actual 
test  schedule,  there  may  be  constraints  which  require  that  several  require¬ 
ments  be  grouped  into  one  subtest,  or  which  prevent  the  bests  being  sequenced 
as  desired,  or  which  affect  the  test  schedule  in  other  ways.  Consequently, 
a  second  model  is  then  needed  to  map  the  requirements  from  their  positions 
in  the  ordered  set  to  their  final  position  in  the  test  schedule.  The  con¬ 
cept  upon  which  this  research  was  based  is  shown  in  Figure  1. 

Three  vital  tasks  must  be  accomplished  to  execute  this  schematic  con¬ 
cept.  First,  for  each  prototype  to  be  tested,  the  appropriate  factors, 
categories  and  weighting  values  must  be  determined.  Secondly,  a  model  must 
be  developed  which  will  map  the  sec  of  random  requirements  according  to  the 
parameters  determined.  The  third  task  to  be  accomplished  is  to  develop  a 
scheduling  algorithm  to  transform  the  set  of  ordered  requirements  into  a 
constrained  sequence  of  ordered  subtests. 

The  first  two  tasks  were  accomplished  by  experimenting  with  actual 
suitability  tests  that  were  being  conducted  in  the  US  Army  R&D  community. 

A  brief  summary  of  this  portion  of  the  research  will  be  given.  The  third 
task  of  developing  a  scheduling  algorithm  has  been  partially  accomplished 
and  is  included  for  future  consideration.  However,  it  should  be  noted  that 
this  algorithm  has  not  been  used  on  an  actual  suitability  test  as  of  this 
date. 

DETERMINATION  OF  MODEL  PARAMETERS.  The  primary  concern  is  to  develop 
a  model  to  maximize  the  flow  of  information  upon  which  to  base  the  dispo¬ 
sition  decision.  It  is  desired  to  place  the  most  important  requirements 
first  in  the  testing  sequence.  The  desired  placement  is  a  function  of  the 
requirement  importance  and  the  effect  that  the  testing  of  the  prototype 
against  the  requirement  may  have  on  the  overall  rate  of  information  flow. 

If  a  model  can  be  constructed  which  develops  a  measure  that  represents  the 
"Requirement  Importance  vs  Effect  on  Overall  Information  Flow"  tradeoff  for 
each  requirement,  then  these  measures  can  be  used  in  specifying  and  sequenc¬ 
ing  subtests.  This  measure  of  criticality  will  be  indicated  as  C^. 

The  next  task  is  to  identify  the  factors  which  are  relevant  to  measuring 
C^.  These  factors  will  be  applicable  to  any  suitability  test,  but  will  be 

in  varying  degrees  for  each  prototype.  This  phase  of  the  research  was  con- 


Set  of  randomly 
placed  requirements 


Set  of  ordered 
requirements. 
Unconstrained 
sequence  of  test 
requirements 


Constrained 
sequence  of 
ordered  subtests 


SCHEMATIC  OF  CONCEPT 
Figure  1 
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ducted  by  extensive  interviews  and  group  discussions  with  experienced  test 
officers  at  a  test  installation.  As  a  result,  five  major  factors  were 
identified  for  consideration  and  for  inclusion  into  the  model.  These  fac¬ 
tors  are  shown  in  Figure  2. 

These  five  factors  should  not  be  considered  as  being  an  exhaustive 
list  applicable  to  all  suitability  tests.  There  should  be  a  flexible 
method  for  selecting  the  factors  appropriate  to  each  test  analyzed.  Conse¬ 
quently,'  it  is  proposed  that  the  first  step  in  analyzing  a  suitability 
test  is  to  present  the  above  five  factors  to  the  test  supervisory  personnel 
and  ask  them  to  consider  the  applicability  of  each  factor.  In  addition  to 
considering  applicability  of  these  five,  it  is  also  essential  that  they  be 
given  the  opportunity  to  add  other  factors  if  needed. 

Once  the  factors  applicable  to  testing  a  prototype  have  been  identified, 
the  importance  of  each  factor,  j,  relative  to  the  other  factors  must  be 
determined.  These  are  simply  factor  weights  and  will  be  indicated  by  W ^ . 

In  their  ciscussion  of  scoring  models,  Moore  and  Baker  (1)  stress  the  impor¬ 
tance  of  assigning  weights  to  factors  in  order  to  insure  that  the  model 
reflects  the  priorities  of  the  decision  makers.  Similarly,  in  the  model 
being  developed,  it  is  essential  that  weights  be  determined  to  reflect  the 
relative  importance  of  the  factors. 

Numerous  means  are  available  to  determine  relative  ionortance.  These 
include  simple  rank  ordering,  correlated  simple  rankings,  ratings,  and 
successive  ratings.  The  method  of  successive  ratings  was  selected  for  this 
research  because: 

a.  It  is  a  simple  and  fast  method; 

b.  It  will  allow  the  decision-maker  to  determine  the  weights  consi¬ 
dered  appropriate  by  each  judge  as  well  as  the  overall  group 
weights; 

c.  It  forces  each  judge  to  develop  ratings  which  he  feels  to  be 

?  consistent  and; 

d.  The  method  is  intuitively  appealing. 

For  this  portion  of  the  research,  test  supervisory  personnel  ranked 
the  factors  applicable  to  the  test  of  a  prototype  by  a  simplified  version 
of  the  Delphi  Technique.  This  simplified  ranking  scheme  converged  rapidly 
to  a  ranking  acceptable  to  each  judge.  Once  the  factors  had  been  ranked, 
the  method  of  successive  ratings  was  used  to  assign  weights  to  each  factor. 

This  portion  of  the  research  used  two  major  suitability  tests  as  experi¬ 
mental  vehicles.  The  rankings  and  weights  assigned  in  one  test  differed 
from  those  assigned  in  the  other.  Whether  these  differences  are  due  to  dif¬ 
ferences  in  the  prototypes  for  the  two  tests,  or  due  to  differences  between 
the  groups  is  an  unanswered  question.  However,  it  appears  that  the  prototype 
tested  is  the  most  important  factor  since  the  members  of  both  groups  agreed 
that  they  could  rank  and  weight  the  factors  for  any  particular  test. 

After  the  factors  which  were  considered  to  be  important  for  inclusion 
into  the  model  have  been  selected  and  weighted,  the  next  task  is  to  cate¬ 
gorize  each  of  them.  Categorization  is  merely  the  partitioning  of  each 
factor  into  levels. 
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PROBABILITY  OF  FAILURE 


CONFIDENCE  LEVEL 


IMPACT 


DESTRUCTIVENESS 


CONSEQUENCE 


The  estimated  probability  that  the  . 
prototype  will  not  meet  the  specific 
requirement.  (Relates  to  the  Importance 
of  the  requirement.) 


The  estimated  accuracy  of  the  estimated 
probability  of  failure.  (Relates  to 
the  Importance  of  the  requirement.) 


The  Importance  of  the  requirement  to 
the  potential  suitability  of  the  Item. 
(Relates  to  the  Importance  of  the 
requirement.) 


The  potential  destructiveness  of  the 
subtest  required  for  testing  the 
prototype  against  the  requirement. 
(Relates  to  the  effect  on  Information 
flow.) 


The  effect  that  the  results  of  the 
subtest  evaluating  the  requirement 
would  have  on  the  test  schedule  If 
the  requirement  Is  not  met.  (Relates 
to  the  effect  on  information  flow.) 


FACTORS,  j 
Figure  2 


' V 


The  categorization  of  the  factors  in  this  research  was  accomplished 
by  two  test  project  groups  and  was  generally  based  on  the  guidance  found 
in  the  US  Army  Test  and  Evaluation  Command  Regulation  70-34  on  Risk  Analysis 
(2).  The  categories  shown  in  Figures  3,  4,  and  5  were  agreed  upon  by  both 
test  project  groups.  It  must  be  emphasized  that  these  categories  are  merely 
suggested  and  may  in  fact  be  prototype  dependent. 

In  categorizing  the  estimated  probability  that  a  requirement  would  not 
be  met,  the  test  groups  indicated  a  preference  to  use  the  point  estimate  of 
the  probability.  This  same  technique  was  also  used  in  categorizing  the  con¬ 
fidence  level  with  which  the  estimate  of  probability  of  failure  is  made. 

Now  with  each  factor  partitioned  into  categories,  it  is  necessary  to 
weight  the  categories  of  each  factor.  This  portion  of  the  research  again 
used  a  modified  Delphi  Technique  to  order  the  categories  of  each  factor. 

Then  each  member  of  each  test  panel  weighted  the  categories  of  each  factor 
on  a  scale  from  one  to  ten  with  ten  being  applied  to  the  most  important 
category  and  one  being  applied  to  the  least  important  category.  The  other 
categories  were  scaled  between  one  and  ten.  The  category  score  was  assigned 
as  the  average  score  for  each  category. 

The  net  category  weight  for  each  factor  was  then  computed  as  the  pro¬ 
duct  of  factor  weight  and  category  score.  An  example  of  this  is  shown  in 
Figure  6. 

It  was  found  during  this  portion  that  the  categories  of  Impact  and 
Probability  of  Failure  were  constant  in  both  tests.  This  may  be  a  random 
occurrence  or  it  may  be  true  for  all  suitability  testing.  The  categories 
for  the  other  factors  were  not  so  clear  and  this  indicates  prototype 
dependence. 

The  result  thus  far  has  been  the  determination  of  the  parameters  which 
may  be  included  in  the  model.  The  next  task  is  to  determine  the  parameters 
applicable  to  each  requirement.  The  results  of  this  portion  of  the  research 
indicate  that  each  member  of  the  test  group  should  categorize  each  require¬ 
ment.  Then  a  composite  of  these  is  given  to  the  test  project  officer  for  a 
final  determination  as  to  the  category  of  each  requirement.  This  step  in 
the  procedure  is  seen  as  a  simplified  version  of  the  Delphi  Technique. 

A  summary  of  the  parameter  development  is  shown  in  Figure  7. 

CONSTRUCTING  AND  TESTING  THE  MODEL.  The  problem  now  is  to  develop  a 
model  using  these  parameters  to  transform  the  random  requirements  into  a 
set  of  ordered  requirements.  It  is  hypothesized  that  such  a  model  would 
be  of  the  form  ,  r 

Ct  -  G{ [N  J  (i)],  [N  £  (i)J,  •  •  . ,  [N  j  (i)J . [N  k  (i)]> 

where  is  the  measure  of  criticality  (i.e.,  a  number  which  reflects  the 

"Requirement  Importance  vs  Effect  on  Overall  Information  Flow"  trade-off 
of  requirement  i)  and  Nj  (i)  is  the  importance  of  category  k  of  factor  j 

relative  to  the  other  categories  of  j  for  requirement  i. 
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DESTRUCTIVE 

DAMAGING 

SENSITIVE 

STABLE 


Testing  against  the  requirement  is  potentially 
destructive  to  the  test  item. 


Testing  against  the  requirement  is  potentially 
damaging  to  the  test  item  or  to  components  not 
under  test. 


The  requirement  relates  to  a  component  which 
is  delicate  and  which  could  be  easily  damaged 
during  the  course  of  unrelated  tests. 


The  requirement  does  not  require  potentially 
destructive  testing  and  does  not  relate  to  a 
delicate  component. 


CATEGORIES  OF  THE  FACTOR  DESTRUCTIVENESS 
Figure  3 


CRITICAL 

IMPORTANT 

DESIRED 

MINOR 


Failure  to  meet  the  requirement  Is  sufficient 
for  declaring  the  Item  unsuitable. 


Failure  to  meet  the  requirement  is  not  suffi¬ 
cient  for  declaring  the  item  to  be  unsuitable 
but  the  requirement  will  be  given  major  con¬ 
sideration  in  making  the  final  determination 
of  suitability. 


The  requirement  will  be  given  some  considera¬ 
tion  in  making  the  final  determination  of 
suitability. 


The  requirement  will  be  given  little  or  no 
consideration  in  making  the  final  determlna 
tlon  of  suitability. 


CATEGORIES  OF  THE  FACTOR  IMPACT 
Figure  4 


V 


If  the  requirement  Is  not  met,  the  consequence  to  the  test  plan  may  be: 


STOP  TESTING 
SUSPEND  TESTING 
TEST  DELAY 
DEGRADE  TEST 


The  test  will  be  stopped  for  an  undetermined 
length  of  time  or  will  be  terminated. 

The  test  will  result  In  a  slippage  of  more  than 
S  days  In  the  test  schedule. 

There  will  be  a  test  schedule  slippage  of  from 
1  to  5  days. 

Testing  may  continue  In  a  degraded  mode  while 
the  deficiency  Is  being  corrected.  There  will 
be  no  test  schedule  slippage  nor  significant 
effect  In  the  determination  of  suitability  of 
the  Item  under  test. 


OVERTIME  REQUIRED 


Retesting  or  additional  wqrk  will  be  required 
but  there  should  be  no  test  schedule  slippage. 


RESCHEDULING 


REPEAT  TEST 

WAIVE 


NONESSENTIAL 


Testing  will  continue  but  rescheduling  of  sub¬ 
sequent  requirements  will  be  required.  However, 
neither  rescheduling  nor  retesting  should  result 
In  test  schedule  slippage. 

Testing  will  continue,  but  the  failed  require¬ 
ment  will  require  re-evaluation  during  other 
planned  tests. 

The  requirement  will  probably  be  waived  due  to 
being  overly  stringent  or  beyond  the  current 
state  of  the  art.  Falling  the  requirement 
will  have  no  effect  on  the  test  schedule. 

The  requirement  will  not  affect  the  determina¬ 
tion  of  suitability  and  falling  the  requirement 
will  have  no  effect  on  the  test  schedule. 


CATEGORIES  OF  THE  FACTOR  CONSEQUENCE 
Figure  5 
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FACTOR(j) 


Destructiveness 


FACTOR 
WEIGHT  W, 

0,10)  J 

CATEGORIES 

“5 

CATEGORY 
SCORE  W* 

(1.10)J 

NET 

CATEGORY, 
WEIGHT  Nj 

3.0 

Destructive 

1.0 

3.0 

Damaging 

5.8 

17.4 

Sensitive 

10.0 

30.0 

Stable 

6.1 

18.3 

EXAMPLE  OF  FACTOR  DESTRUCTIVENESS 
Figure  6 


Identify  the  appropriate  factors  through  test 
group  discussion. 


DETERMINATION  OF  MODEL  PARAMETERS 
Figure  7 
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In  determining  G,  only  linear  and  simple  multiplicative  functions  are 
evaluated.  Disjunctive  and  conjunctive  functions  described  by  Einhorn  (3,  4) 
and  logarithmic  functions  were  considered  but  not  Included  in  this  research. 
The  primary  reason  for  not  investigating  these  forms  Is  that  the  potential 
benefits  from  the  more  complicated  model  would  be  offset  by  its  computational 
difficulties.  It  must  be  stressed  that  this  research  was  oriented  towards 
the  suitability  test  planner  who  cannot  be  expected  to  have  an  operations 
research  or  other  strong  mathematical  background. 

Six  models  were  evaluated  during  this  research.  Models  1  and  2  are 
formulated  as: 

n  k 

C.  -  £  N  “  (i)  (1) 

j-1  3 


n  k 

c.  -  n  N  *  (i)  (2) 

j-i  3 

Models  3  and  4  are  respectively  formulated  as: 


m 

C.  -  2  N  *  (i)  (3) 

j-1  3 


“  k 

c  -  n  N  *  (i)  (4) 

j-1  3 

where  the  factor  confidence  level  is  not  included  in  the  set  j  -  (l,2,...,m). 
Models  5  and  6  are  respectively  formulated  as: 


3 

£ 

j-1 


"  J  «> 


(5) 


3 

n 

j=i 


N  J  (1) 


(6) 


where  j  denotes  the  three  factors  of  Impact,  Probability  of  Failure  and 
Consequence  as  specified  in  TECOM  Regulation  70-34. 

This  phase  of  the  research  involved  designing  an  experiment  In  which 
significant  indications  of  the  relative  desirability  of  additive  and  multi¬ 
plicative  models  could  be  determined.  Models  1  and  2  are  included  since  it 
is  hypothesized  that  one  of  them  is  the  desired  model.  Since  Model  6  was 
proposed  by  the  US  Army  Test  and  Evaluation  Command  for  identifying  "high 
risk"  requirements,  it  was  included.  Models  3  and  4  were  considered  since 
the  factor  Confidence  Level  was  not  deemed  appropriate  in  one  of  the  test 
projects  considered  in  this  research.  Models  3  and  4  are  essentially  com¬ 
promises  between  Models  1  and  2  and  Models  5  and  6. 

Each  model  was  used  to  compute  the  measures  of  criticality  for  each 
requirement.  For  ease  of  reading,  the  term  "score"  is  used  as  being 
synonomous  with  the  term  "measure  of  criticality." 
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Based  on  Che  scores  computed  by  Che  six  models,  six  sequences  of 
requirements  were  generated.  The  requirement  placed  first  in  each  sequence 
was  the  one  receiving  the  highest  score  by  the  corresponding  model.  The 
other  requirements  are  then  sequenced  in  order  of  decreasing  scores.  As 
expected,  the  sequences  were  not  identical. 

Procedures  were  developed  for  identifying  the  most  desirable  sequence. 
These  procedures  were  developed  for  determining  a  ranking  of  the  sequences 
and  consequently  a  ranking  of  the  models.  The  purpose  was  to  Identify  the 
better  function  (linear  or  simple  multiplicative)  and  to  identify  whether 
the  model  should  include  the  five  factors  identified  earlier  or  only  the 
three  factors  identified  by  TECOM.  It  is  assumed  that  since  the  sequences 
are  determined  by  the  models,  the  sequence  identified  as  being  the  most 
desirable  must  be  the  output  of  the  best  model.  Three  procedures  were 
used  in  attempting  to  identify  the  best  model. 

The  first  procedure  involved  the  test  officer's  attempting  to  rank 
the  sequences  generated  by  each  model.  This  procedure  was  found  unsatis¬ 
factory  since  it  required  him  to  consider  too  many  variables.  For  example, 
one  test  had  59  requirements,  5  factors  and  7  categories  per  factor.  This 
is  more  than  2000  decision  variables  in  comparing  just  two  sequences. 

The  second  procedure  involved  the  test  personnel  discriminating  between 
sequences  indirectly.  The  technique  used  was  to  have  the  test  officers  com¬ 
pare  requirements  which  had  received  appreciably  different  rankings  in 
linear  and  multiplicative  models.  The  orocedure  appeared  feasible  but  no 
significant  results  were  obtained.  There  was  no  detectable  fault  in  the 
procedure  used  so  it  was  concluded  either  the  results  indicate  none  of  the 
models  is  a  particularly  good  predictive  model  or  that  the  judges  were  not 
consistent  in  their  evaluations. 

The  third  procedure  involved  simulating  the  actual  results  which  would 
have  been  experienced  if  tests  had  been  conducted  according  to  each  model. 

This  simulating  procedure  addressed  the  normative  side  of  model  building  in 
that  the  results  of  the  rankings  rather  than  the  rankings  themselves  are 
considered.  This  approach  was  found  to  be  successful  in  that  a  ranking  of 
sequences  (and  consequently  a  ranking  of  models)  is  generated  with  a  signi¬ 
ficant  level  of  concordance  among  evaluations. 

In  this  procedure,  it  was  hypothesized  that  if  the  judges  could  identify 
the  simulated  tests  which  they  considered  to  be  better  scheduled  and  if  these 
tests  could  be  ranked  in  order  of  desirability,  then  an  ordering  of  the  rela¬ 
tive  desirability  of  the  models  would  result.  For  this  procedure  a  seventh 
sequence  based  upon  random  placement  of  requirements  was  generated  and  was 
identified  as  Model  7.  • 

Based  upon  the  simulations  for  one  of  the  prototype  tests  (Test  A), 
the  test  sequences  were  ranked  by  test  personnel  judges.  The  results  are 
shown  in  Figure  8. 

The  judges  on  this  test  agreed  that  each  of  the  models  produced  sequences 
superior  to  Model  7.  It  was  also  concluded  that  the  additive  models  (i.e., 

1,  3,  and  5)  were  respectively  superior  to  the  multiplicative  models  (i.e., 

2,  4,  and  6).  It  was  further  concluded  that  the  five  factor  models  (i.e., 

I  and  2)  were  superior  to  the  three  factor  models  (i.e.,  5  and  6).  These 
conclusions  were  reinforced  by  the  results  of  simulations  on  another  proto¬ 
type  test  (Test  B) . 


■  l  .r  ^ 
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ranking  of  mooels  based  on  test  a 

Figure  8 
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Based  on  these  conclusions.  Models  1,  2,  5,  and  6  were  simulated  for 
Test  B  and  presented  to  a  new  panel  of  judges  (i.e.,  new  test  supervisory 
personnel).  These  judges  ranked  the  models  as  shown  in  Figure  9. 

The  judges  concurred  that  the  sequence  generated  by  Model  1  produced 
a  test  schedule  superior  to  that  generated  by  the  others.  This  sequence 
was  so  obviously  superior  that  further  opinions  were  not  obtained. 

Test  A  was  ready  to  begin,  so  no  rescheduling  was  allowed  for  that 
prototype  based  on  model  results.  Test  B  had  sufficient  planning  time  to 
make  use  of  the  model  results.  The  test  officer  for  Test  B  was  given  the 
requirement  sequence  of  58  requirements  for  his  test  as  generated  by  Model 
1.  He  was  asked  to  use  this  ranking  in  any  way  he  saw  fit  in  scheduling 
the  test. 

In  scheduling  the  test  the  test  officer  first  Identified  the  constraints 
active  for  Test  B.  As  it  turned  out,  only  technological  constraints  were 
required  and  these  dictated  that  the  test  consist  of  four  subtests  of  multi¬ 
ple  naqulrements.  Three  of  these  subtests  were  required  to  be  conducted 
sequentially  and  the  fourth  subtest  consisted  of  requirements  which  required 
evaluation  throughout  the  entire  testing  period.  The  requirements  which 
were  required  to  be  placed  in  each  subtest  were  identified  and  grouped  within 
their  appropriate  subtests.  The  ranking  of  requirements  generated  by  Model 
No.  1  were  then  used  to  order  the  requirements  within  each  subtest  to  form 
the  final  test  sequence.  The  ordering  of  requirements  in  each  subtest  was 
rank  order  consistent  with  the  ordering  of  the  requirements  in  Sequence  No.l. 
Finally,  the  time  and  personnel  requirements  for  the  subtests  were  identi¬ 
fied  and  a  tentative  test  schedule  which  required  four  personnel  and  2  weeks 
was  established.  The  test  officer  found  the  ranking  of  requirements  gener¬ 
ated  by  Model  No.  1  to  be  of  appreciable  assistance  when  establishing  the 
order  in  which  the  requirements  would  be  addressed  within  each  subtest.  He 
also  considered  the  resulting  test  schedule  to  be  "optimum",  or  as  nearly 
"optimum"  as  he  could  determine. 

Prior  to  the  conduct  of  this  research,  a  tentative  test  schedule  for 
Test  B  had  been  developed.  According  to  the  previously  developed  schedule, 
a  planning  figure  of  16  weeks  was  established  for  the  time  required  to 
complete  the  suitability  test.  Of  course,  this  planning  figure  is  a  pessi¬ 
mistic  estimate.  A  most  likely  estimate  of  the  time  required  had  not  been 
determined.  c  ‘  ; 

Through  the  application  of  the  methods  and  model  described  herein,  a 
test  schedule  was  developed  with  a  most  likely  estimate  of  the  time  required 
being  established  at  2  weeks.  The  test  officer  did  not  wish  to  establish  a 
new  planning  figure,  or  pessimistic  estimate,  until  he  had  re-evaluated  all 
possible  contingencies.  However,  he  was  confident  that  the  new  planning 
figure  would  be  no  more  than  A  weeks.  No  claims  are  made  that  through  the 
use  of  the  procedures  and  model  developed  in  this  research  a  test  schedule 
will  be  developed  which  will  require  less  than  one-fourth  of  the  time  which 
would  otherwise  be  required.  However,  it  appears  that  the  procedures  can 
result  in  either  a  substantial  savings  in  test  time  or  a  more  accurate 
estimate  of  the  test  time  required. 
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Finally,  the  authors  and  the  test  officer  discussed  the  resulting  test 
schedule.  Without  benefit  of  the  sequence  generated  by  the  model  or  of  the 
categorizations  of  the  Requirements  established  earlier,  the  test  officer 
was  asked  to  justify  his  test  schedule  based  onljy  upon  the  verbal  descrip¬ 
tions  of  the  requirements  and  hi's  knowledge  of  the  overall  test.  He  was 
able  to  justify  convincingly  the  relative  placemjent  of  each  requirement 
and  found  the  schedule  to  be,  somewhat  to  his  owjn  surprise,  "optimum"  from 
his  point  of  view.  Of  course,  this  determination  of  optimality  is  based 
uoon  subjective  judgment  and  che  validity  of  the  conclusions  is  only  as 
valid  as  the  judgment  of  the  officer  making  them.  However,  this  officer  is 
an  experienced  test  officer  and  could  reasonably  be  considered  to  be  an  ex¬ 
pert  in  his  field.  Until  a  constrained  optimization  model  is  developed 
which  will  replace  expert  judgment  in  qualitative  analysis,  the  opinions  of 
the  experts  in  the  field  will  have  to  be  used  in  the  determination  of  optimality. 

Based  on  the  apparent  success  of  this  method,  it  was  concluded  that 
Model  1  was  applicable  in  developing  an  ordered  set  of  requirements  for  use 
in  rcheduling  suitability  tests  for  a  prototype  item.  Model  1  is  a  simple 
linear  model  and  ircludes  all  factors  considered  important  by  the  decision 
makers  Involved. 

To  recap  the  research  thus  far,  we  have  determined  the  parameters  deemed 
important  for  inclusion  in  the  model  and  we  have  selected  an  acceptable  norma¬ 
tive  model  to  map  a  set  of  randomly  placed  requirements  into  a  set  of  ordered 
requirements.  This  set  of  ordered  requirements  represents  an  ordered  uncon¬ 
strained  test  sequence  of  requirements.  If  we  could  evaluate  each  requirement 
sequentially  we  would  maximize  the  rate  of  information  flow  by  the  measure  of 
criticality.  However,  this  unconstrained  testing  is  not  practical  since  it 
does  not  consider  time  and  personnel  constraints j on  the  testing  sequence. 

This  brings  us  to  the  third  task  outlined  in  the  research  concept. 

i 

SCHEDULING  PROCEDURES.  The  problem  under  consideration  now  is  to  deter¬ 
mine  a  procedure  for  developing  an  actual  test  schedule  which  will  result  ir. 
the  optimum  rate  of  information  being  generated  during  the  test.  The  possioll- 
ities  of  evaluating  each  requirement  simultaneously  and  evaluating  the  require¬ 
ments  individually  and  sequentially  are  considered  infeasible  and  are  not 
addressed.  This  portion  of  the  research  assumes  !that  there  are  technological, 
precedence  or  proximity  constraints  which  make  t^ese  type  tests  impractical. 

The  following  discussion  of  scheduling  procedures!  raquires  the  adoption  of 
the  assumptions  shown  in  Figure  10.  j 

Three  approaches  were  investigated  in  developing  test  schedules.  The 
first  two  approaches  considered  situations  in  which  the  requirements  could 
be  easily  partitioned  into  logical  and  practical  subtests.  The  last  approach 
investigated  a  procedure  to  quantitatively  assign  the  requirements  to  sub¬ 
tests  and  then  order  the  subtests  into  a  constrained  test  sequence. 

The  first  approach  considers  unconstrained  test  scheduling  where  one  or 
more  requirements  have  already  been  assigned  to  each  subtest.  The  value  of 
each  subtest  is  determined  from  the  model  previously  developed  and  the  assump¬ 
tions  shown  in  Figure  10.  The  time  required  for  each  subtest  must  be  esti¬ 
mated  by  the  test  planner.  These  time  estimates  are  presently  being  accom¬ 
plished  so  there  is  no  new  requirement  for  the  evaluating  organization. 


-220- 


1.  The  measure  of  criticality.  C*,  Is  the  same 
as  the  value  of  the  Information  which  will 
be  gained  from  evaluating  the  prototype 
against  the  requirements. 

2.  The  value  of  the  requirement  Is  the  same 
as  the  value  of  the  Information  to  be 
gained  from  evaluating  the  prototype 
against  the.  requirement. 

3.  "The  value  of  a  subtest  Is  the  sum  of  the 
values  of  the  Included  requirements. 

4.  There  Is  a  linear  relationship  between 
the  value  of  the  Information  obtained 
from  a  subtest  and  the  length  of  time 
which  will  be  spent  on  the  subtest. 

5.  The  time  required  to  complete  a  subtest 
Is  the  same  as  the  time  required  to  test 
the  prototype  against  the  most  time  con¬ 
suming  requirement  Included  In  the  subtest. 

6.  The  personnel  required  to  conduct  a  sub¬ 
test  Is  the  sum  of  the  personnel  required 
to  Individually  evaluate  each  of  the 
requirements. 


SCHEDULING  ASSUMPTIONS 
Figure  10 


Procedure  A  is  a  two-step  algorithm  recommended  for  determining  a  test 
schedule  under  these  conditions.  This  procedure  is  shown  in  Figure  11. 

This  procedure  is  a  direct  application  of  Theorem  3-10  stated  and  proved  by 
Conway,  Maxwell  and  Miller  in  their  book.  The  Theory  of  Scheduling  (6). 

In  situations  where  the  subtests  have  been  established  and  where  prece¬ 
dence  constraints  are  active,  a  second  procedure  is  recommended.  Procedure 
B  is  recommended  under  these  circumstances.  This  procedure  is  an  applica¬ 
tion  of  the  constrained  least  cost  testing  sequence  described  by  Mankekar 
and  Mitten  (7).  See  Figure  12. 

Basically  this  procedure  involves  isolating  those  subtests  for  which 
the  precedence  constraints  are  active  and  then  systematically  satisfying 
the  constraints.  After  this  is  done.  Procedure  A  is  applied  in  a  manner 
which  does  not  violate  any  of  the  constraints  previously  satisfied. 

Set  1  consists  of  those  subtests  for  which  precedence  constraints  are 
active  and  Set  2  consists  of  those  subtests  for  which  there  are  no  prece¬ 
dence  constraints.  The  matrix  R  is  an  m  x  m  matrix. 

R  ■  {r^}  where  m  is  the  number  of  subtests  in  Set  1  and 


r^j  ■  1  if  subtest  i 

must  precede  subtest  j;  otherwise 


rij  "  °*  <rii  3  0)* 


The  matrix  R  reflects  all  precedence  constraints  on  Set  1.  The  matrix  R* 
is  identical  to  R  and  is  merely  used  as  a  working  matrix.  With  these  defi¬ 
nitions  in  hand,  the  procedure  will  lead  to  an  optimal  least  time  test 
sequence  under  the  assumptions  noted.  The  development,  proof  of  finiteness 
and  proof  of  optimality  were  developed  by  Mankekar  and  Mitten  (7).  A 
computational  algorithm  for  using  this  precedure  is  shown  in  Appendix  1. 

In  these  two  approaches  at  scheduling,  it  was  assumed  that  the  subtests 
were  predetermined.  The  next  approach  attempts  to  quantitatively  assign 
the  requirements  to  appropriate  subtests.  «  ‘  ' 

It  is  assumed  that  the  only  constraint  is  personnel  where  only  N  person¬ 
nel  are  available  for  commitment  to  a  subtest.  It  is  further  assumed  that 
V^,  P^  and  T^  are  known  where  these  variables  are  the  value,  personnel  re¬ 
quired,  and  the  time  required  for  requirement  i,  respectively.  V  can  be 
determined  from  the  model  previously  developed  and  and  T^  can  be  esti¬ 
mated  as  they  are  presently  being  done.  The  problem  now  becomes  one  of 
designing  the  best  set  of  subtests  which  can  be  sequenced  by  Procedure  A. 

This  problem  is  analogous  to  the  n/m  job  shop  problem  where  the  n  jobs 
(test  requirements)  are  assigned  to  m  machines  (subtests).  Procedure  C, 
shown  in  Figure  13,  is  presented  here  only  for  consideration  as  a  solution 
to  the  problem.  It  draws  heavily  from  the  work  done  by  Conway,  Maxwell  and 
Miller.  It  has  received  very  little  testing  and  has  not  been  applied  to  an 
actual  suitability  test.  However,  it  is  simple,  intuitively  appealing,  and 
it  can  be  carried  out  by  hand  or  coded  for  computer  use. 
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Procedure  B 
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;  CONCEPT  SUWtARY 
Figure  14 


Basically,  Procedure  C  involves  forming  four  sequences  of  require¬ 
ments  and  alternately  drawing  from  the  sequences  to  form  trial  subtests. 
The  subtest  which  contains  the  most  value  is  then  selected  as  the  best 
of  the  trial  subtests  and  the  procedure  is  repeated  until  each  require¬ 
ment  is  assigned  to  a  subtest.  A  computational  algorithm  for  using  Proce¬ 
dure  C  in  included  in  Appendix  II. 

SUMMARY.  The  three  procedures  used  during  this  research  can  be  put 
into  perspective  by  relating  to  the  original  concept  of  the  research.  See 
Figure  14.  The  model  developed  mapped  a  set  of  randomly  placed  require¬ 
ments  into  a  set  of  ordered  requirements.  Procedure  A  maps  this  set  into 
an  unconstrained  sequence  of  subtests  when  the  requirements  have  already 
been  assigned  to  subtests.  Procedure  B  maps  the  set  of  ordered  require¬ 
ments  into  a  constrained  sequence  of  subtests  when  the  requirements  have 
already  been  established  and  there  are  active  precedence  constraints. 
Procedure  C  maps  the  set  of  ordered  requirements  into  subtests  and  then 
develops  the  order  or  sequence  for  these  subtests  when  there  are  active 
personnel  constraints. 

None  of  the  scheduling  procedures  used  during  this  research  are  com¬ 
pletely  satisfying.  Each  procedure  is  only  a  partial  answer.  What  is  re¬ 
quired  is  a  procedure  which  will  map  a  set  of  ordered  test  requirements 
into  an  ordered  sequence  of  subtests  when  there  are  active  precedence, 
proximity,  personnel,  time  and  economic  constraints.  This  is  an  area  for 
future  research. 

However,  it  is  felt  that  the  portions  of  the  research  dealing  with 
parameter  identification  and  model  development  are  a  worthwhile  basis  for 
further  research  into  the  problem  of  test  scheduling  when  there  are  multi¬ 
ple  active  constraints. 
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APPENDIX  A 


STEP  1 


STEP  2 

STEP  3 


STEP  4: 
STEP  5: 
STEP  6: 

STEP  7: 
STEP  8: 


COMPUTATIONAL  ALGORITHM  FOR  PROCEDURE  B 


:  Form  two  sets  of  subtests.  Let  Set  I  consist  of  those 
subtests  for  which  precedence  constraints  are  active.  Let 
Set  2  consist  of  those  subtests  for  which  no  precedence 
constraints  are  active.  Steps  2  through  15  refer  to 
Set  1  only. 

:  Order  the  subtests  in  Set  1  by  Procedure  A.  Index  the 
subtests  according  to  their  relative  position  in  S(A) 
with  the  first  subtest  in  the  sequence  being  denoted  s-j . 

Form  an  m  x  m  matrix  R  =  (r^j)  where 

r^j  *  1  if .subtest  i  must  precede  subtest  j, 

otherwise  r^j  =  0;  and 

r^  =  0;  and 

r^j  =  1  implies  rj.  =  0;  and 
if  r^j  *  1  and  rjk  =  1,  then  r^  =  1. 

Form  a  matrix  R'  =  {r^j}  identical  to  matrix  R. 

Set  the  index  k  =  1. 

Consider  each  pair  of  subtests  i  and  j.  If  r ■■ .  *  1  and 
subtest  i  precedes  subtest  j  in  the  current  sequence, 
set  =  2. 

If  r^j  f  for  all  i  and  j  go  to  Step  15. 

J  c  '  c  c 

Scan  R'  to  determine  if  r,£  =  1  for  any  i.  If  there  exists 
an  1  such  that  r^  -  1 ,  go  to  Step  9.  If  r.-£  +  1  for  all  i 
set  k  equal  to  the  index  of  the  next  subtest  in  the  current 

sequence  amJ  n-pe.v  this  step. 

Form  set  T ^  of  all  subtests  i  for  which  r^  =  1. 


STEP  9: 


STEP  10:  Apply  Procedure  A  to  set  to  form  the  ordered  set  T£. 

STEP  11:  Place  the  ordered  set  T£  immediately  in  front  of  subtest  S^. 

STEP  12:  Consider  each  pair  of  subtests  i  and  j  for  which  r^j  *  2. 

If  subtest  j  now  precedes  subtest  1,  set  r^j  *  1. 

STEP  13:  Set  k  equal  to  the  index  of  the  first  subtest  In  the  current 
sequence  of  subtests  in  Set  1. 

STEP  14:  Go  to  Step  6. 

STEP  15:  Label  the  current  sequence  S(A) * . 

STEP  16:  Apply  Procedure  A  to  Set  2.  Label  the  resulting  sequence 
S(A)\  S(A)H  -  {Sj}. 

STEP  17:  Form  sequence  S(B)  from  sequences  S(A)'  and  S(A)"  by  Itera¬ 
tively  Integrating  the  S^'  into  S(A)'  such  that 
Tj/Vj  <  T|/V!j  <  Tf+i/V^.  S(B)  is  the  desired 
sequence  for  testing. 
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«  APPENDIX  B 

COMPUTATIONAL  ALGORITHM  FOR  PROCEDURE  C 


STEP  1:  Apply  Procedure  A  to  the  requirements  to  form  Sequence  1. 
For  those  requirements  In  which  T</Vj  ■  Tj/Vj  place  the 
more  time  consuming  requirement  first  In  the  sequence. 

STEP  2:  Construct  SequenceStof  requirements  with  the  requirements 
being  ranked  In  order  of  Increasing  value  of  personnel 
required.  Resolve  ties  by  placing  the  more  time  consuming 
requirement  first. 

STEP  3:  Construct  Sequence  3  of  requirements  with  the  requirements 
being  ranked  In  order  of  decreasing  value  of  Tf .  Resolve 
ties  by  placing  the  more  valuable  requirement  first. 

STEP  4:  Construct  Sequence  4  of  requirements  with  the  requirements 
being  ranked  In  order  of  decreasing  value  of  Vf.  Resolve 
ties  by  placing  the  more  time  consuming  requirement  first. 

STEP  5:  Construct  a  base-subtest  by  including  In  the  subtest  the 
first  consecutive  requirements  from  Sequence  3  until  the 
Inclusion  of  the  next  requirement  In  the  sequence  would 
violate  the  personnel  constraint.  Call  this  base  subtest 
BT. 


STEP  6:  Construct  three  tentative  subtests  as  follows: 

'  K.  <  C  :  <■  -  \  •  i  <  '-o'' 

a.  Add  the  first  consecutive  requirements  from  Sequence  4 
to  BT  until  the  Inclusion  of  the  next  requirement  In 
the  sequence  would  violate  the  personnel  constraint. 
Next  add  the  first  consecutive  requirements  from 
Sequence  1  to  the  current  subtest  until  the  Inclu¬ 
sion  of  the  next  requirement  would  violate  the  per¬ 
sonnel  constraint.  Finally,  to  this  subtest  add  the 
first  consecutive  requirements  from  Sequence  2  until 
the  inclusion  of  the  next  requirement  would  violate 
personnel  constraint.  Label  this  subtest  BT-j. 

b.  Construct  subtest  BT2  In  a  manner  similar  to  con¬ 
structing  BT-|.  However,  in  forming  BT-|  requirements 
were  added  to  BT  from  Sequences  4,  1,  and  2  In  that 
order.  In  forming  BT2  add  requirements  to  BT  from 
Sequences  1,  4,  and  2  1^  that  order. 


c.  Construct  subtest  BT3  by  adding  the  first  consecutive 
requirements  from  Sequence  2  until  the  Inclusion  of 
the  next  requirement  in  the  sequence  would  violate 
the  personnel  constraint. 

STEP  7:  From  subtests  BT^,  BT2,  and  BT3  select  the  subtest  with 
the  greatest  value.  Note  that  the  time  required  for  each 
subtest  is  the  same  as  the  time  required  for  each  of  the 
other  subtests  since  each  subtest  is  based  upon  BT.  Con¬ 
sequently,  this  step  Involves  selecting  the  subtest  with 
the  minimum  value  of  T/V. 

STEP  8:  Delete  from  Sequences  1,  2,  3,  and  4  those  requirements 
Included  In  the  subtest  selected  In  Step  7. 

STEP  9:  If  each  of  the  requirements  has  been  included  In  selected 
subtests,  go  .to  Step  10.  Otherwise  return  to  Step  5. 

STEP  10:  Apply  procedure  to  the  subtests  generated  to  form  Test 
Sequence  S(A). 

STEP  11:  Scan  S(A)  until  the  first  subtest  is  found  In  which  the 

personnel  constraints  are  not  active.  Call  this  subtest  k 
with  test  time  required  being  T^  and  personnel  required 
being  Pk.  If  no  such  subtests  are  located,  then  go  to 
Step  14. 

STEP  12:  Continue  to  scan  $(A)  until  the  first  requirement  Rj  Is 
found  such  that  Tj  <  T|<  and  Pi  ±  N  -  Pl.  Place  Rj  J1nto 
subtest  k.  If  no  such  requirement  Is  located,  then  return 
to  Step  11.  Scan  Immediately  below  subtest  k. 

<  o  '  f  if  :  c  , 

STEP  13:  Return  to  Step  10.  . 

STEP  14:  Stop.  The  current  sequence  Is  the  desired  sequence  which 
should  be  the  basis  of  the  testing  schedule. 
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STOPPING  RULES  FOR  SEQUENCING  WITH  PARTICULAR 
REFERENCE  TO  MISSILE  RANGE  SCHEDULING* 


Paul  H.  Randolph 

Department  of  Mathematical  Sciences 
New  Mexico  State  University 
Las  Cruces,  New  Mexico 

ABSTRACT.  Monte  Carlo  methods  have  been  proposed  for  finding  solu¬ 
tions  to  scheduling  problems.  One  deficiency  of  these  methods  has  been 
the  absence  of  appropriate  rules  for  stopping  the  sampling  processes.  This 
paper  presents  stopping  rules  that  not  only  have  been  found  effective  for 
a  variety  of  sequencing  problems,  but  also  provide  a  measure  of  the  quality 
of  the  sequence  chosen.  Reference  to  missile  range  scheduling  is  made. 

MISSILE  RANGE  SCHEDULING.  At  a  missile  range  a  set  of  missions  are 
requested  each  day.  One  way  to  schedule  these  missions  is  to  take  a  per¬ 
mutation  of  the  missions  and  schedule  the  missions  as  early  in  the  day  as 
possible  in  the  order  of  the  permutation,  but  with  no  conflict  in  the  re¬ 
sources  required  for  each  mission.  Because  of  the  nature  of  the  standard 
work  day,  it  may  not  be  possible  to  schedule  some  missions  when  using  the 
given  permutation.  Different  permutations  will  give  schedules  with 
different  sets  of  missions  that  are  scheduled  and  not  scheduled. 

With  each  mission  there  is  associated  a  payoff,  so  that  a  schedule  pay¬ 
off  is  the  sum  of  the  payoffs  of  the  scheduled  missions.  If  a  permutation 
is  selected  by  a  random  procedure,  then  the  corresponding  schedule  payoff 
can  be  considered  a  random  variable.  By  taking  a  sequence  of  random  per¬ 
mutations,  a  sequence  of  random  variables  of  schedule  payoffs  is  obtained. 

If  "enough"  of  these  random  payoffs  are  obtained,  the  random  schedule 
generation  can  be  terminated  and  the  schedule  corresponding  to  the  best 
of  the  observed  schedule  payoffs  can  be  used  for  the  set  of  missions  requested 
for  the  day.  The  problem,  of  course,  is  to  determine  how  much  is  "enough"; 
or,  in  other  words,  when  to  stop  the  random  generation  of  schedule  or 
sequences.  ‘ 

STOPPING  RULES  FOR  INTEGER  PAYOFFS.  Let  X^X^...  denote  the  random 

variable  of  the  payoffs  associated  with  generating  successive  sequences  by 

a  Monte  Carlo  sampling  process.  For  the  present,  assume  Chat  each  sequence 

payoff  is  an  integer  and  that  the  objective  of  the  sequencing  problem  is  to 

find  a  sequence  for  which  the  payoff  is  maximized.  Furthermore,  without 

loss  of  generality,  assume  that  all  payoffs  are  positive  and  bounded  above 

by  the  known  integer  l.  Also,  let  y  denote  the  maximum  of  the  payoffs, 

n 

x^,...,xn,  obtained  from  the  first  n  sequences;  that  is  yn  ■  maxfx^, . . . .x^) . 
Thus,  it  is  assumed  that  sampling  will  be  with  recall. 


♦Research  for  this  paper  was  partially  supported  under  AROD  Contract  No. 
DAHC04-C-0011  at  the  Instrumentation  Directorate,  White  Sands  Missile 
Range,  New  Mexico. 
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The  probability  function  for  each  sequence  payoff  is  the  multinomial 
characterized  by  P(X»k)  -  p(k),  k  ■  1, ...,£.  When  the  values  of  p(k)  are 
known,  then  Chow  and  Robbins  [1],  [5]  have  shown  that  the  optimal  stopping 
rule  is  obtained  by  calculating  the  expected  increase  in  gross  payoff 
associated  with  generating  another  sequence. 

I 

T(yn)  -  Z  (k-yn)p(k), 

k»y 

■'n 

and  comparing  this  with  the  relative  cost,  c,  of  generating  a  single  se¬ 
quence  on  the  computer;  that  is,  if  T(y  )  >  c,  continue  for  another  obser- 

c  «  ‘  ® 

vation;  if  T(y  )  _<  c,  stop.  The  function  T(y  )  is  sometimes  called  the 
stopping  rule  "function. 

Note  that  this  stopping  rule  states  that  at  each  stage  of  the  process, 
the  experimenter  computes  the  expected  gain  from  taking  exactly  one  further 
observation  and  then  terminating  the  Drocess.  If  the  expected  net  gain 
from  taking  this  observation  is  not  positive,  then  the  process  is  terminated 
Otherwise,  the  next  observation  is  taken,  and  a  similar  computation  is  again 
performed.  This  procedure  is  called  the  myopic  procedure  because  at  each 
stage  the  experimenter  does  not  look  beyond  the  possible  outcomes  of  his 
very  next  observation  when  making  his  decision,  and  for  sampling  with  a 
known  distribution  this  myopic  procedure  is  optimal. 

Unfortunately,  for  sequencing  problems  the  values  of  P(X»k)  ■  p(k), 
k  -  1, ...,£,  are  almost  never  available.  However,  even  though  these  pro¬ 
babilities  are  not  known,  it  is  possible  to  obtain  estimates  of  these  pro¬ 
babilities  through  a  Bayesian  analysis,  and  then  substitute  these  Bayesian 
estimates  for  the  p(k)  into  the  above  myopic  stopping  rule  function  to  ob¬ 
tain  what  might  be  called  a  Bayesian  stopping  rule  for  multinomial  obser¬ 
vations  [4], 

To  obtain  these  estimates  define  the  i-dimensional  vector  0-(01> . . . ,0^) 
of  probabilities  such  that  for  the  n-th  observation,  X^,  the  conditional 
probability  function  is  given  by  P(Xn  *  k|©)  -  0^,  k  »  1,...,  Jt,  where  0  is 
an  element  of  the  simplex  s  1  *  ' 

<■  i 

S  -  {0  E  E1:  l  0  -  1,  ©  >  0,  k  -  1,...,A}. 

k-1  * 

Since  the  conjugate  prior  density  [3]  for  the  multinomial  is  the  Dirichlet, 
the  initial  prior  density  of  0  can  be  written  a3 

fQ(©)  -  r(m)  n  {0^  /  ro^)] 
k=l 

This  is  the  Bayesian  prior  density  of  0  for  observation  X  , . 

n+i 


Furthermore,  since  the  joint  density  function  for  X  ,,  and  8.  is  8.  f  (8), 

n+i  k  Ic  n 

then  the  marginal  distribution 


PQ(k)  -  (m^+n^/On+n) ,  k  -  !,...,£, 


is  the  probability  that  X  ,,  will  take  on  the  value  k. 

n*T*l 

This  value  of  P(Xn+^-k)  -  p^(k)  can  substituted  for  p(k)  in  the 

stopping  rule  function,  T(yn),  to  obtain  what  might  be  called  the  "Bayesian 

stopping  rule  function",  which  will  be  denoted  T_(y  ),  and  is  given  by 

o  n 


W  -  J  (k^n>Pn<k) 

k«y 

^n 


(m+n) 


2 

Z  (k-y  >V 
k«y 


Comparing  the  value  of  this  function  with  the  value  of  c  will  determine  a 
stopping  point;  that  is,  if  T  (y  )  <  c,  the  sampling  of  sequence  payoffs 

should  be  stopped.  Since  y^  is  a  monotonically  non-decreasing  function  of 

n,  then  T^Cy^)  Is  a  decreasing  function  of  n,  which  approaches  zero  as  n 

increases.  Thus,  sampling  always  will  eventually  stop. 

Thin  Bayesian  stopping  rule  function  depends  on  the  specification  of  a 
set  of  parameters  associated  with  the  Dirichlet  prior  density  function.  If 
these  parameters,  m^,..,,m  ,  are  examined,  it  will  be  noted  that  they  can 

be  written  in  terms  of  the  initial  probabilities  as  «  mp^Ck),  k  =  1,...,2. 

Since  the  Pg(k)  are  essentially  normalized  values  of  the  m^,  it  may  be  pre¬ 
ferable  to  specify  these  initial  probabilities,  Pg(k),  k  »K1,...,2,  (of 

which  only  2-1  are  independent)  and  the  parameter  m,  rather  than  to  estimate 
the  m^directly.  This  can  be  done  by  an  arbitrary  selection  of  probability 

values,  by  specifying  a  discrete  probability  function,  or  even  by  integrating 
a  continuous  function  over  a  unit  interval  containing  k. 

The  parameter  m  has  some  interesting  characteristics.  A  lower  bound 
for  m  is  zero,  and  this  can  be  a  greatest  lower  bound  only  when  pQ(k)  “ 

1 /2,  k  ■  1,...,!,  As  m  +  0,  then  T  (y  )  -*■  0,  and  the  Monte  Carlo  process 

d  n 

stops  with  the  first  observation,  implying  no  confidence  in  the  initial 
probabilities.  On  the  other  hand,  as  m  -*■  ®,  then 

,  i  i  i  - , . ! 

Tn+1  (yn)  “  m(ari-n)  1  Z  (k»yn)pQ(k)  -*■  Z  (k=yn>pQ(k)  =*  T(yn> 
k-yn  K=y 


which  is  the  expected  improvement  for  a  known  multinomial  distribution,  in¬ 
dicating  a  complete  confidence  in  the  initial  probabilities.  Thus,  the 
parameter  m  can  be  interpreted  as  a  coefficient  of  confidence  in  the  initial 
probabilities.  In  fact,  it  can  be  considered  as  being  analogous  to  the 
sample  size  that  would  be  needed  to  obtain  through  a  random  sample  the  same 
quality  of  estimate  of  Pg(k)  as  those  given  by  the  specified  prior  probabilities 
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STOPPING  RULES  FOR  CONTINUOUS  PAYOFFS.  In  missile  range  scheduling 
the  schedule  payoff  can  be  taken  to  be  the  sum  of  the  payoffs  of  scheduled 
missions.  Each  mission  payoff  is  assumed  to  be  a  quadratic  function  of 
the  mission  priority,  weighted  by  a  "project  readiness"  factor.  Project 
readiness  is  the  probability  that  the  contractor  will  not  cancel  the  mission 
after  it  has  been  scheduled.  If  the  priority  for  mission  1  is  given  by  r^ 

and  the  probability  of  noncancellation  is  q^,  then  the  schedule  payoff  is 

x  ■  Evt2 

where  the  summation  is  over  all  missions  that  can  be  scheduled  when  using  a 
given  permutation.  Since  q^  is  a  number  between  0  and  1,  it  is  evident  that 

the  payoffs  will  not  be  integers.  Furthermore,  the  number  of  different 
possible  values  of  the  payoffs  is  large  and  also  the  values  of  these  pay¬ 
offs  are  unknown.  Thus,  for  missile  range  problems  the  payoff  can  essen¬ 
tially  be  considered  a  continuous  random  variable. 

To  determine  the  prior  distribution  of  9  for  discrete  payoffs,  it  was 
proposed  that  values  of  m^  be  obtained  through  the  specification  of  the 

initial  probabilities  pg(k).  One  way  of  estimating  these  initial  probabili¬ 
ties  is  by  Integrating  a  continuous  function  over  a  unit  interval  that  con¬ 
tains  Che  point  k.  This  suggests  that  a  limiting  procedure  could  result  in 
a  stopping  rule  for  continuous  payoffs. 

Suppose  that  the  sequence  payoffs  can  assume  arbitrary  values  in  the 
interval  [0,1],  and  let  Aj,...,A^  be  any  partition  of  this  interval,  where 

is  defined  as  A^  -  k  -  2,3,...,v,  A^  »  [0,x’].  Suppose  H(x) 

is  a  distribution  function  of  [0,1]  such  that 


dH(x)  -  H(x^)  -  H(x^_1) 


reflects  the  experimenter's  prior  intuition  for  the  initial  probabilities 
for  each  partition  A^,  k  *  1, ...,£,  regardless  of  the  method  of  partition¬ 
ing.  If  x£  is  any  point  in  A^,  then  the  integral  defined  by 

l  ‘ ^  .  ■  . 

-I  (  -1  v 

m(m+n;  (x-yn>dH(x)  -  lim  m(nrtn)  l  (x^'-y^)  (H(x^)  -  H(x'  ))  I(x"  ^  y) , 

y  k»l  '  K 

n 

is  the  expected  gross  improvement  in  payoff  for  an  additional  observation, 
and  is  denoted  by  T  (y  ).  I(x)  is  the  usual  indicator  or  characteristic 
function.  n 

As  an  example,  assume  the  normal  distribution  reflects  the  experiraen- 
tor's  beliefs  for  a  particular  set  of  initial  probabilities.  If  <f  denotes 
the  standarized  normal  distribution  function  (zero  mean  and  unit  variance) 
and  <p  its  corresponding  density  function,  then  the  stopping  rule  function 
becomes 
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Tn+l*yn^  “  »(«+«) ”1(o(^((yn-n)/o)“<K()l“W)/o))  +  (u-yn>  (<K(t-u)/a)-*((yn-w)/a))) 

where  u  end  a  ere  Che  aeen  and  variance,  respectively,  for  the  prior  dis¬ 
tribution.  It  is  suggested  that  a  set  of  initial  observations  be  used  to 
estimate  the  parameters  u  and  a. 

c  c  c 

EXAMPLE.  In  Figure  1  is  given  the  results  when  40  missions  were  re¬ 
quested,  and  where  the  total  number  of  different  resources  for  these  missions 
was  116.  If  all  40  missions  had  been  scheduled,  the  schedule  payoff  would 
have  been  189.5. 

By  scheduling  via  a  Monte  Carlo  procedure,  a  total  of  19  schedules 
were  generated.  The  corresponding  schedule  payoffs,  xr,  are  indicated  in 

the  second  column,  with  the  maximum  payoffs,  y^,  given  in  the  third  column. 

In  the  next  column  are  the  expected  payoffs,  T_(y  ).  In  this  problem  the 

o  n 

value  of  c  was  0.001.  Three  missions,  all  of  low  payoff  value,  were  not 
scheduled. 

It  should  be  noted  that  the  expected  payoff  from  continued  sampling 
is  0.00090.  That  is,  if  the  computer  were  permitted  to  continue  generat¬ 
ing  schedules,  the  amount  of  expected  improvement  of  schedule  payoff  would 
be  only  this  much  over  the  maximum  of  186.88  that  was  obtained  by  stopping 
wich  the  nineteenth  observation.  This,  of  course,  is  a  measure  of  the 
quality  of  the  schedule  finally  chosen. 

CONCLUSIONS  AND  LIMITATIONS.  Myopic  stopping  rules  have  been  applied 
to  missile  range  scheduling  at  White  Sands  Missile  Range  with  very  satis¬ 
factory  results.  It  is  possible  to  generate  a  schedule  in  one-twentieth 
of  a  second  and  thus  in  a  few  minutes  hundreds  of  schedules  can  be  observed, 
and  the  Bayesian  stopping  rules  are  very  effective  in  determining  the 
stopping  point  in  the  sampling  procedure. 

However,  there  exists  one  problem.  It  has  been  shown  that  the  myopic 
procedure  is  optimal  for  random  variables  with  known  distributions.  When 
the  probability  values  are  not  known,  then  the  myopic  rule  is  not  appropri¬ 
ate.  The  fundamental  distinction  is  that  when  the  distribution  is  completely 
specified,  the  observations  are  independent;  that  is,  knowledge  of  the 
values  of  some  of  the  observations  provides  the  experimenter  with  no  addi¬ 
tional  information  about  the  values  of  the  other  observations.  On  the  other 
hand,  if  the  distribution  involves  the  value  of  one  or  more  parameters  that 
have  prior  distributions,  the  observations  are  dependent  under  their  joint 
marginal  distribution.  Hence,  knowledge  of  the  values  of  some  of  the  obser¬ 
vations  will,  by  providing  information  about  the  value  of  the  parameters, 
also  provide  information  about  the  values  of  the  other  observations.  This 
difference  between  independent  and  dependent  observations  distinguishes 
these  two  types  of  problems.  The  observations  in  a  random  sample  from  a 
distribution  involving  unknown  values  of  parameters  will  no  longer  be 
independent. 

In  general  it  is  felt  that  myopic  rules  applied  to  the  random  variable 
with  unknown  probabilities,  using  a  Bayesian  analysis,  will  provide  stop¬ 
ping  rules  that  are  "near-optimal".  Preliminary  analysis  for  the  multi¬ 
nomial  indicates  that  such  a  rule  may  be  conservative,  that  is,  requiring 
more  observations  than  necessary  before  stopping,  but  this  is  not  certain. 

So,  until  more  accurate  results  are  available,  the  myopic  rules  will  be 
used  as  good  approximations  that  will  vield  "near  optimal"  sequences. 


LARGEST  POSSIBLE  TOTAL  EXPECTED  SCHEDULE  PAyOFF  IS  107.500 

N  SCHEDULE  PAVOFF  LARGEST  PAYOFF  SO  FAR  F  XpEC  TED  IMPROVEMENT 
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FIGURE  1 

Computer  Printout  of 

Monte  Carlo  Scheduling 
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ABSTRACT 


A  procedure  is  given  for  constructing  approximate  confidence  limits  for 
P(Y  <  X),  whore  X  and  Y  are  independent  random  variables;  the  distribution  of 
Y  being  known  and  normal  and  the  distribution  of  X  being  unknown  and  positively 
skewed.  A  problem  of  determining  the  probability  that  the  sidewall  of  a 
combustible  cartridge  case  will  not  be  burned  through  prior  to  firing  an 
artillery  round  in  an  automatic  firing  cycle,  given  that  it  is  ignited  by 
smoldering  residue  after  chambering,  is  used  to  illustrate  the  technique.  A 
listing  of  a  computer  subroutine  for  the  procedure  is  also  given. 

INTRODUCTION 


An  artillery  round  with  a  combustible  cartridge  case  is  fired  from  a 
weapon  using  a  control  system  which  loads  the  round,  aims  the  weapon  and  fires 
the  weapon,  all  automatically.  At  least  two  rounds  of  ammunition  are  fired 
in  this  fashion  and  some  smoldering  residue  from  the  preceding  round  may 
remain  in  the  chamber  of  the  weapon  when  a  round  is  loaded.  Let  R  be  the 
conditional  probability  that  the  sidewall  of  the  cartridge  case  of  the  chambered 
round  is  not  burned  through  prior  to  firing,  given  that  it  is  instantaneously 
ignited  by  smoldering  residue.  The  problem  is  to  find  both  a  point  estimate 
and  a  952  lower  confidence  limit  for  R,  using  information  concerning  the 
gun  cycle  time  and  data  on  cartridge  case  burn-through  time  obtained  from 
laboratory  tests. 

There  was  sufficient  gun  cycle  time  data  available  to  justify  the 
assumption  that  the  elapsed  time  between  chambering  and  firing  a  round  is 
normally  distributed  with  (true)  mean  p2  m  2.9  seconds  and  standard  deviation 
a2  ■  0.13  seconds. 

An  experiment  was  conducted  to  estimate  the  statistical  distribution  of 
cartridge  case  sidewall  burn-through  times.  One  hundred  and  fifty  samples  of 
sidewall  material  were  taken  from  several  cartridge  cases  and  tested.  Each 
sample  of  sidewall  material  was  ignited  and  the  elapsed  time  between  ignition 
and  burn-through  was  measured  by  three  observers  using  stop  watches.  No  data 
were  obtained  for  two  samples  and  there  were  some  missing  data  for  some  of 
the  item  under  consideration,  it  was  necessary  to  sample  cartridge  cases 
from  only  one  lot  and  assume  that  this  sample  was  randomly  selected  from  the 
conceptual  population  consisting  of  all  such  cartridge  cases  which  will  be 
manufactured.  It  was  suggested  that  the  validity  of  this  assumption  should 
be  verified  by  further  testing  when  a  sample  which  is  more  representative 
of  the  production  item  can  be  selected. 
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The  bum- through  data  were  analyzed  assuming  a  one  way 
classification,  components  of  variance  (random  effects),  analysis  of 
variance  model  with  unequal  number  of  observations  per  cell.  The 
component  of  variance  attributable  to  observers  was  much  smaller  than 
the  among  samples  component  of  variance  (0.08  sec.2  vs.  2.23  sec.2) 
and  was  not  statistically  significant  at  the  a  *  0.001  level  of  significance 
It  was  concluded  that  the  precision  of  measurement  resulting  from  the 
use  of  observers  with  stop  watches  was  adequate. 

The  data  were  used  to  test  the  hypothesis  of  normality  of  the 
distribution  of  bum-through  times,  a  requirement  of  the  Church-Harris 
procedure  used  for  estimating  R.  A  chi-square  goodness  of  fit  test 
rejected  this  hypothesis  at  the  0.05  level  of  significance  but 
accepted  It  at  the  0.01  level.  Since  the  chi-square  test,  which 
is  relatively  insensitive  to  departures  from  normality  in  the  region 
of  the  tails  of  a  distribution  was  inconclusive,  the  statistic 
bj  ■  n[E(x^  -  x)3]2[z(x^  -  x)2]"3  was  calculated  and  used  to  test 
the  hypothesis  that  the  distribution  Is  not  skewed.  This  hypothesis 
was  rejected  at  the  0.02  level  of  significance  (a  two  tall  test  with 
0.01  probability  in  each  tail  was  used)  so  It  was  inferred  that  the 
distribution  Is  positively  skewed.  Next,  Craig's  procedure  [2]  was 
used  to  determine  which  member  of  the  Pearson  system  of  frequency  curves 
best  describes  the  data.  It  was  found  that  the  Pearson  Type  III  curve 
(a  gamma  density  function)  fits  the  data  best.  From  Carver's  table  of 
the  standardized  Type  III  function  [3],  it  was  verified  that  the  lower 
tall  of  the  Pearson  Type  III  curve  contains  less  area  in  the  interval 
-•  <  x  <_  y  +  a  than  a  normal  curve  with  the  same  mean  and  variance. 

This  indicated  that  a  normality  assumption  would  leave  to  conservative 
point  and  Interval  estimates  of  R,  1.  e.,  If  the  estimates  are  biased, 
the  bias  will  be  such  that  R  is  underestimated. 
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For  the  purpose  of  estimating  R,  It  was  assumed  that  cartridge 
case  sidewall  burn-througn  time  Is  normally  distributed  with  estimates 
of  the  mean  and  standard  deviation  of  the  distribution  being  X  »  9.71 
seconds  and,  S  »  1.49  seconds,  respectively. 

APPLICATION  OF  A  MODIFIED  CHURCH-HARRIS  PROCEDURE 

The  procedure  used  for  estimating  R  Is  based  on  the  work  of  Church 
and  Harris  [1],  Let  the  random  variable  X  be  the  cartridge  case  sidewall 
bum-through  time  In  seconds  and  the  random  variable  Y  be  the  gun  cycle 
time  In  seconds.  Assume  that  X  and  Y  are  statistically  Independent  and 
both  normally  distributed.  Introducing  the  notation 


E{X)  -  vi 
VAR{X)  -  02 
E(Y)  -  y2 
VAR(Y)  -  a\ 

and  defining  the  random  variable  W  ■  Y  -  X,  It  follows  that  W  Is  normally 
distributed  with  E(W)  -  u2  -  Hi  and  VAR(W)  «  a*  +  ajj.  Then 

R  -  PtY  <  X}  -  P{Y  -  X  <  0}  -  P{W  <  0>.  u  /  , 


We  next  make  the  transformation  2  ■ 


W  -  (u2  -  m) 


°!  +  °2 


so  that  Z  Is 


distributed  normally  with  E(Z)  ■  0  and  VAR(Z)  *  1.  Then 


R  -  P{W  <  0) 


■p  -v{z‘7=| 
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where  ♦(•)  Is  the  standard  normal  cumulative  distribution  function. 
Substituting  the  known  values  w2  an(*  °|  and  the  estimated  values 
ti)  ■  X  and  c2  ■  S2  Into  (1)  yields  the  point  estimate 


R  •  • 


(^4) 

V/S*  +  0%! 


Now,  having  a  point  estimate  of  R,  we  proceed  to  approximate 
the  probability  distribution  of  the  random  variable  R  and  use  this  to 
construct  an  approximate  100(1  -  y)X  lower  confidence  limit  for  R. 

In  doing  so,  we  make  use  of  the  fact  that  X  Is  normally  distributed 
with  E(X)  •  wj  and  VAR(X)  ■  of/n;  S2  Is  asymptotically  normally 
distributed  with  E(S2)  -  o2  and  VAR(S2)-2o*/(n-1) ;  and  that  X  and  S2 

are  statistically  Independent. 

2ai 

Let  T  -  S2  -  a\  so  that  E(T)  -  0  and  VAR(T)  -  ^ 


and  define 


X  -  W2 


fS2  +  a2 


Expanding  V  In  a  Taylor's  series  about  the  point  [E(X),  E(T)]  -  (ui •  0) 
we  obtain 

,  X  V2  C  1  (ui  *  V2 )T  1 
V  *  — — —  -  ■y  1  ■"*t  +  0(— ). 

Sof  *  a *  *  (/a2  +  a*) 

with  probability  one. 

Because  of  the  Indenendence  of  X  and  T,  the  distribution  of  V  Is 
asyasymptotically  normal  with  < 

.  .  VI  ~  V2 


+ 


%  y 


VAR(V) 


-  °*  fi  + 1 

Ln  7  Cn-l)(af  ♦  c|)2J  ‘ 

Using  S2  to  estimate  o2  and  X  to  estimate  uj  we  obtain  as  an  estimate 
of  the  standard  deviation  of  V 

2  1/2 

*  a  $  r  t_  i  ~  U2)  1 

V  ^52  +  o2  0  ^  (n-l)(S2  +  o|)2J 


(Ml  -  U2  \ 

•f -  )■  ♦CE(V)] 

+  of  / 
and 

It  follows  that 

P  |r  >  *[V  -  ♦'*  (1  -  Y)ay]|  «  1  .  Y  -  (2) 

We  use  (2)  to  obtain  the  100(  1  -  y)  t  lower  confidence  limit  for  R  as 

LCLi_y  -e[V  -  *”1(1  -  Y)oy]. 
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COMPUTATIONAL  RESULTS 


For  the  combustible  cartridge  case  problem  we  want  a  95%  lower 
confidence  limit  for  R.  Since  ♦”1(.95)  ■  1.645,  this  limit  Is  given 

<by  •  ‘  ‘  ‘  '  ‘  :  ' 

LCL<95  •  e(v  -  1.645ov). 

A  computer  program  (Appendix  A)  was  prepared  to  calculate  V,  ft,  ay 

and  a  95%  lower  confidence  limit  for  R.  In  this  particular  problem  a 
95%  lower  confidence  limit  of  1  -  10“4  was  considered  satisfactory  to 
assure  the  margin  of  safety  required.  The  95%  lower  confidence  limit 
was  determined  to  be  1  -  0.2  X  10“4. 

Next,  the  question  of  how  much  the  mean  gun  cycle  time  can  be 
Increased  and  still  leave  a  95%  lower  confidence  limit  of  1  -  10“4 

for  R  was  considered.  To  answer  this  question  It  was  assumed  that  the 

coefficient  of  variation  of  the  random  variable  Y,  02/112*  remained 
constant  as  p2  Increased,  from  2.9  seconds  to  3.9  seconds,  in  steps  of 
0.1  seconds.  V,  ft,  oy  and  the  95%  lower  confidence  limit  for  R  were 
calculated  at  each  step.  It  was  found  that  the  mean  gun  cycle  time 

can  be  Increased  as  much  as  0.5  seconds  without  the  95%  lower  confidence 

limit  for  R  falling  below  1  -  10‘4. 

SUMMARY  AND  CONCLUSIONS 

The  application  of  the  Church-Harris  technique  to  the  combustible 
cartridge  case  problem  can  be  summarized  as  follows:  The  analysis 
depended  on  two  critical  assumptions;  a)  the  sample  of  cartridge  case 
sidewall  material  used  to  obtain  bum-through  time  measurements  was  a 
random  sample  from  the  conceptual  population  consisting  of  all  cartridge 
cases  of  the  same  type  which  will  be  manufactured  in  the  future  and  b) 
the  distributions  of  X  and  Y  are  normal.  Assumption  a)  seemed 
questionable  and  It  was  suggested  that  further  testing  be  done  to 
verify  it.  Assumption  b)  was  not  satisfactorily  established  by  the 
data  but  the  analysis  Indicated  that,  if  the  assumption  is  not  valid 
the  Inferences  drawn  from  the  study  will  be  on  the  safe  side,  1.  e.. 
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R  will  be  underestimated.  The  conditional  probability  that  the  sidewall 
of  a  cartridge  case  of  a  chambered  round  will  not  be  burned  through  prior 
to  firing,  given  that  it  is  instantaneously  Ignited  by  smoldering  residue 
remaining  from  a  round  previously  fired,  was  estimated  to  be  0.9999971  with 
a  95Z  lower  confidence  limit  of  0.9999779.  It  was  also  determined  that  the 
mean  gun  cycle  time  can  be  increased  as  much  as  0.5  seconds  (a  17Z  increase) 
without  the  95Z  lower  confidence  limit  exceeding  0.9999,  provided  that  the 
coefficient  of  variation  remains  constant.  . 
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APPENDIX  A 

The  following  is  a  listing  of  a  subroutine  CHAR  (CHurch-HARris)  and 
a  representative  driving  program  written  for  this  study.  Notice  that  CHAR 
requires  as  input  X(a  vector  of  data),  vz,  o,,  y,  N  and  outputs  X,S,R,ct  , 
and  LCL.  Surboutine  CHAR  calls  subroutines  FND  and  FINVND  to  evaluate 
the  normal  distribution  function  and  the  inverse  of  the  normal  distribution 
function  respectively. 

The  output  of  the  sample  program  is  formatted  as  appears  below.  The 
dimension  statements  (DIMENSION  X(500))  appearing  in  the  driving  program 
may  be  modified  to  accommodate  the  data  available. 
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REAP  INPUT  PARAMETERS 
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STATISTICAL  EVALUATION  OF  FLIGHT  TEST  PERFORMANCE 
OF  THE  HELICOPTER  LIFT  MARGIN  SYSTEM  (HLMS) 

Erwin  Blser  and  Ronald  Kurowsky 
Avionics  Laboratory,  USAECOM 
Fort  Monmouth,  New  Jersey 


ABSTRACT.  An  item  urgently  needed  by  utility  aircraft  is  a  measurement  of 
lift  margin.  Lift  Margin  is  defined  as  the  maximum  available  lift  minus 
the  effective  gross  weight  of  the  aircraft.  It  is  the  intent  of  this 
program  to  define  a  method  whereby  this  value  is  automatically  presented 
to  the  pilot  at  all  times. 

Such  a  system  is  presently  being  designed  and  built  under  the  auspices  of 
the  Joint-Army-Navy-Aircraft-Instrument-Research  (JANAIR)  program.  This 
system  will  also  have  the  added  capability  of  forcasting  Lift  Margin  (L.M.) 
to  and  at  a  given  destination  if  altitude  and  ambient  air  temperature  are 
known.  The  factors  that  are  most  likely  to  affect  the  performance  of 
HLMS  are  torque,  air  temperature,  altitude,  fuel  weight,  and  aircraft  weight. 

The  flight  test  evaluation  of  HLMS  will  be  performed  at  US  Army  Systems 
Test  Activity  (ASTA),  Edwards  AFB,  California;  and  data  collected  on  a 
pulse-code-FM  modulated  (PCM)  system  will  be  reduced  to  digital  format  for 
evaluation.  A  reference  air  density  system,  corrected  for  humidity  and 
calibrated  by  the  National  Bureau  of  Standards  will  be  used  to  define  errors 
caused  by  calculating  air  density  directly  from  pressure  and  temperature. 

A  statistical  analysis  of  the  performance  of  the  entire  HLMS  and  its 
subsystems  (components/sensors)  is  being  undertaken  by  means  of  error 
models  to  determine  and  validate  the  effectiveness  of  HLMS.  The  objective 
is  to  obtain  regression  equations  of  lift  margin  as  a  function  of  torque, 
temperature,  pressure,  etc.  *  «  L  ^  :  , 

Lift  Margin  (L.M.)  =•  Maximum  Available  Lift  (MAL)  -  Effective  Gross  Weight 
(EGW)  LM»0*the  point  at  which  the  aircraft  cannot  hover  at  a  higher  alti¬ 
tude  under  its  present  loading  conditions . 

INTRODUCTION .  The  present  work  was  undertaken  because  it  was  realized  that 
the  safety  and  utility  of  helicopters  would  be  substantially  improved  if. 
the  pilot  knew  at  all  times  the  lift  capability  of  the  vehicle.  There  have 
been  a  number  of  accidents  wherein  the  major  cause  was  attributed  to  the 
inability  of  the  pilot  to  accurately  assess  the  lifting  ability  of  the 
vehicle  and  the  helicopter  "stalled".  This  phenomenon,  usually  referred 
to  as  "settling-with  power",  typically  occurs  when  the  helicopter  attempts 
to  take  off  or  land  at  high  vehicle  gross  weights  and  high  density  altitudes. 
If  a  helicopter  should  land  in  the  early  morning  when  the  air  is  cool  and 
relatively  dense,  it  might  be  impossible  for  the  helicopter  to  tal.e  off 
with  the  same  load  in  the  afternoon  when  the  air  is  warm. 

Preceding  page  blank 
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Theoretically,  it  is  possible  to  account  for  temperature,  pressure, 
humidity  and  varying  loads  by  the  use  of  slide  rules  and  charts  supplied 
in  aircraft  handbooks  or  manuals  but  such  is  simply  not  convenient  or 
fast  enough  to  be  practical  in  most  operating  situations.  In  addition, 
there  is  the  tendency  to  feel  that  once  the  computation  has  been  made,  the 
results  will  hold  for  a  substantial  length  of  time.  Changes  in  ambient 
conditions  over  a  period  of  just  a  few  hours  may  completely  alter  the 
capability  of  the  helicopter. 

For  these  reasons  a  Helicopter  Lift  Margin  System  is  extremely  desirable. 
Such  a  system  will  indicate  to  the  pilot  the  present  potential  lift  in 
excess  of  the  weight  of  the  helicopter.  Therefore,  before  slowing  down 
and  landing  the  pilot  can  check  the  helicopter's  lift  margin  and  determine 
if  it  is  safe  to  hover  and  complete  a  vertical  landing,  or  if  a  rolling 
landing  (STOL  type)  should  be  made,  or  if  in  fact  it  is  totally  unsafe  to 
land  in  a  restricted  space. 

The  major  objective  of  the  Helicopter  Lift  Margin  System  is  to  demonstrate 
the  feasibility  of  continuously  computing  helicopter  lift  margin  with  a 
desired  accuracy  under  varying  operational  conditions.  Other  objectives 
such  as  determining  the  lift  variations  due  to  air  density  measurements 
with  and  without  humidity  inputs  and  the  empirical  use  of  torque  to 
measure  fuel  consumed  will  be  studied. 

DEFINITIONS  OF  PRINCIPAL  PARAMETERS: 

Helicopter  Lift  Margin  »  Maximum  Available  Lift  -  Effective  Gross  Weight 
Maximum  Available  Lift  (MAL) -  The  maximum  left  that  can  be  generated  by 
the  rotor  under  ambient  conditions  of  air  temperature,  air  density,  alti¬ 
tude,  air  speed,  ground  effect  and  engine  characteristics. 

Effective  Gross  Weight  (EGW) -  The  apparent  weight  of  the  helicopter  as 
"seen"  by  the  rotor  under  hover  conditions  considering  air  density,  ground 
effect  and  engine  characteristics. 

,  *  '  <■  c  ■  •  ’  ,»  ;  5  c  ^  - 

UNDERLYING  CONCEPT.  A  concept  by  which  lift  margin  may  be  obtained  is 
illustrated  in  Figures  1  through  3. 

As  shown  in  Figure  1,  lift  margin  is  obtained  by  generating  a  current 
proportional  to  "Potential  Lift".  A  second  current  that  is  proportional 
to  weight  is  subtracted  from  potential  lift.  The  remainder  represents  lift 
margin .  . 

Figure  2  illustrates  that  available  lift  is  obtained  from  the  computation 
of  potential  horsepower  multiplied  by  the  ratio  of  lift  to  horsepower. 
Potential  horsepower  is  that  horsepower  that  can  be  obtained  from  the 
engine  under  the  existing  ambient  conditions.  Similarly,  the  ratio  of  lift 
to  horsepower  is  that  ratio  which  holds  for  the  existing  ambient  conditions. 

Figure  3  shows  the  general  concept  for  obtaining  helicopter  weight.  The 
horsepower  actually  being  used  Is  measured  and  converted  to  weight  by 
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multyplying  with  the  lift-to-horsepower  ratio  appropriate  for  the  existing 
ambient  conditions.  This  ratio  is  identical  with  the  ratio  used  for  the 
potential  lift  computation. 

The  weight  of  the  helicopter  is  not  computed  continuously  but  is  computed 
under  flight  conditions  suitable  for  the  measurement.  At  other  times  the 
weight  servo  stores  the  weight  so  that  it  is  available  continuously.  The 
weight  may  be  measured  while  the  helicopter  is  hovering,  with  or  without 
wind,  with  its  wheels  within  a  few  feet  of  the  ground,  l.e.,  within  a 
fraction  of  the  rotor  diameter  of  the  ground  or  in-ground  effect  (IGE) . 

The  other  suitable  flight  condition  is  hovering  out-of-ground  effect 
(OGE)  at  zero-indicated  air  speed;  i.e.,  the  helicopter  is  hovering  in 
the  air  mass,  not  necessarily  with  respect  to  the  ground. 


The  rest  of  this  article  was  reproduced  photographically  from  the 
authors'  manuscript.  <  £ 


FIGURE  1 
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FIGURE  2 


FIGURE  3 


II 


.  SYSTEM  DESCRIPTION 
FLIGHT  TEST  CONSIDERATION: 

Question:  Can  the  feasibility  of  the  helicopter  lift  margin  system  be 
demonstrated  with  a  meaningful  flight  test  program  which  avoids  the  com¬ 
plexity,  cost,  and  time  required  to  implement  the  complete  system? 

Answer:  Yes,  the  basic  system  feasibility  can  be  demonstrated  by  flight 
testing  the  simplified  version  of  the  helicopter  lift  margin  system  as 
shown  in  Figure  4. 

Figure  b  illustrates  a  method  of  representing  the  system  required  to 
determine  MAL  and  EGW.  It  is  noted  that  the  amount  of  fuel  consumed  is 
obtained  by  means  of  an  empirical  equation  based  on  Torque  (Q) .  The  effect 
of  IGE/OGE  increases  lift  by  approximately  15$  in-ground  effect. 

Another  item  is  destination  lift  margin.  An  attempt  to  obtain  a  measure 
of  this  parameter  consists  of  mechanically  inserting  altitude  and  temperature 
into  the  system  with  the  readout  presenting  the  expected  lift  margin. 
Implementation :  To  simulate  the  helicopter/engine  characteristics  in  a  ^ 
computer . 

To  apply  inputs  to  the  computer  representing  engine  torque,  rotor  speed, 
air  temperature,  altitude ,  air  speed,  fuel  weight,  load  changes  and  ground 
effect.  '  ,  '  t 

To  compute  and  display  continuously  and  automatically  helicopter  lift 
margin  and/or  effective  gross  weight. 

To  prove  feasibility,  three  basic  concepts  of  the  helicopter  lift  margin 
system  must  be  verified. 
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HELICOPTER  LIFT  MARGIN  SYSTEM 


V 


RECORDER  CONNECTION  (VIA  SIC  CONO)  OPEN  DOT  SIGNIFIES  NON-ESSENTIAL 


1.  The  helicopter/engine  dynamic  characteristics  can  be  simulated  in 
a  computer  such  that  maximum  available  lift  can  be  dynamically  computed. 


2.  The  effective  gross  weight  of  a  helicopter  can  be  accurately 
measured  during  a  hover  maneuver. 

3.  The  difference  between  maximum  available  lift  and  effective  gross 
weight  is  helicopter  lift  marginj 

Flight  test  of  a  hover  lift  computer  will  verify  the  above  basic  concepts. 
Lift  is  derived  from  the  basic  equations  of  helicopter  performance 
(Gesson  &  Meyers,  "Aerodynamics  of  the  Helicopter") . 

L  -  CTrrR2p(uR)2  Lift 

Q  -  Cq*R2P(“)R)2R  Torque 

PH  •  CpirR2p  (wR) 3  Horsepower 

where: 

R  *  rotor  radius 

p  =  air  density 

<*>  =  rotor  angular  velocity 

Crp  =>  l.klk  Cp2</f^M  ‘c--'''  <•  1  ■  ^ 

M  =  Figure  of  merit  of  rotor  system 

By  operating  the  aircraft  engine  at  a  constant  speed  (maximum  rpm)  and 
using  charts  available  in  UH-lB  helicopter  manual  we  obtain  lift  as 
Lift  =  mo(HP)2/3p1/3  +  bQP  ° 

Mq  =  9T.4/(HP)2/3 

bQ  =  -997  (a  constant) 
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By  proper  manipulation  of  this  equation  and  charts  from  the  operator's 
handbook 

P  P  ' '  ,  ’ 

MAL  =  K^QasPo  (*i  +  KgT)  +  KgP0  (Kg  +  K^)'1 

P  , 

EGW  =  +  KgPQ  (K  +  K^T)  ‘a 

Qms  =  4p[l0  (Kx  +  K2T)J  "1 

It  Is  these  equations  that  will  be  used  to  compute  Lift  Margin  In  the  HUS1S 


HI.  OBJECTIVE  OF  EXPERIMENT 

Helicopter  Lift  Margin  is  defined  as  the  difference  between  Maximum 
Available  Lift  and  Effective  Gross  Weight.  The  accuracy  of  system  is 
dependent  upon  the  accuracy  of  Maximum  Available  Lift  and  Effective  Gross 
Weight.  The  objective  of  the  experiment  is  therefore  to  measure  the  errors 
in  Maximum  Available  Lift  and  Effective  Gross  Weight. 

INDEPENDENT  VARIABLES  OF  THE  SYSTEM: 

1.  Ambient  temperature  (T) 

2.  Compressor  Inlet  Temperature  (CIT) 

3.  Absolute  Pressure  (P) 

4.  Relative  Humidity  (RH) 

5«  Torque  (Q) 

DEPENDENT  VARIABLES: 

1.  Effective  Gross  Weight  (EGW) 

2.  Maximum  Available  Lift  (MAL) 

3 •  Fuel  Flow  c  - 

"■  v  c  '  c  i  • 

4.  Air  Density  ( 0  )  ^  0 

EGW  =  f(Fuel  Flow,  Hover  Torque,  Air  Density) 

MAL  =  g(Fuel  Flow,  Maximum  Available  Torque,  Air  Dens-tty) 
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IV.  OPERATIONS  AND  PROCEDURES 


Below  Is  described  a  typical  operation  of  the  H IMS,  including  the 
initial  elements  of  acquiring  the  aircraft  effective  gross  weight  and  air¬ 
craft  engine  potential. 

1.1  TOPPING.  To  store  the  engine  potential,  the  pilot  will,  after 
-  <•  the  aircraft  is  airborne  and  the  HUiS  is  turned  on, 

operate  the  engine  at  maximum  take-off  power,  putting 
the  craft  into  a  sufficient  climb  angle  to  limit  the 
airspeed  to  a  suitable  value.  The  (ENTER  DATA)  switch 
is  operated  along  with  the  out-of-ground  effect  (OGE) 
and  (EXISTING)  switches,  the  (TOP)  pushbutton  depressed 
to  enter  the  engine  maximum  available  torque,  Q^.  When 
the  (TOP)  pushbutton  becomes  illuminated,  has  been 
stored  in  the  memory,  correction  of  to  the 
maximum  standard-condition  torque,  Qf,jg>  being  performed 
automatically.  At  this  point  the  (TOP)  pushbutton  may 
be  released  and  engine  power  reduced,  and  the  (ENTER 
DATA)  switch  restored  to  the  (DISPLAY  DATA)  condition. 

If  the  (TOP)  pushbutton  is  operated  after  the  (DISPLAY 
DATA)  switch  is  activated,  will  be  displayed,  proper 
attention  being  given  (automatically)  to  the  transmission 
limit .  o  “  c 

I*2  WEIGHING.  To  store  the  aircraft  effective  gross  weight  (EGW) , 

the  aircraft  is  brought  to  a  hover  in  level  flight  out- 
of-ground  effect,  the  (ENTER  DATA),  (OGE),  and  (EXISTING) 

/ 

/  switches  all  being  activated.  The  hover  pushbutton  is 

/ 


depressed  until  it  becomes  illuminated,  indicating 
that  the  aircraft  weight  has  been  stored  in  the  EGW 
memory.  This  weight  is  automatically  updated  for  fuel 
consumption,  as  may  be  demonstrated  by  operating  the 
(DISPLAY  DATA)  and  (HOVER  WEIGH)  (weight)  switches  . 
upon  which  the  steadily-decreasing  gross  weight  will 
be  displayed. 

1.3  Lift  Margin.  Once  the  topping  (Para  l.l)  and  weighing  (para  1.2) 
operations  have  been  performed,  the  HIMS  will  read  out 
lift  margin  continuously  in  pounds  of  lift  capability 
for  the  ambient  conditions  surrounding  the  aircraft. 

This  data  will  be  qualified  appropriately  by  the  operation 
of  three  switches  whose  functions  are  described  below. 

1.3.1  (COMP  DENS /DIR  DENS)  PUSHBUTTON.  This  switch  selects  either  a 
confuted  or  externally-supplied  value  of  ambient  air 
density  for  calculation  of  rotor  lift.  The  selection 
is  specific  to  this  model  of  HIMS  only,  and  is  used  to 

v  ■  —  C  -  '  s.  •-  ’  -  '  1  :  c  “  t-  o 

compare  the  two  methods  of  deriving  air  density  as  applied 
to  the  computation  of  rotor  lift. 

1.3*2  (OGE/lGE)  PUSHBUTTON.  This  switch  is  used  to  increase  the 

,  computed  maximum  available  lift  (MAL)  by  about  15  per¬ 
cent  to  approximate  the  effect  of  the  ground  (2-foot  skid 
height  only)  in  lift  margin  and  weighing  operations. 

Por  instance,  an  OGE  lift  margin  can  be  computed  from 
an  in-ground-effect  (IGE)  weight  determination.  The 
manually-controlled  Hover  Lift  Computer  was  not  so 
arranged . 
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In  order  to  develop  error  models  over  the  entire  operational  range  of 
the  Lift  Margin  System,  altitude  should  vary  from  sea  level  to  10,000  feet; 


temperature  from  0°C  to  +25°C  and  relative  humidity  from  10$  to  100$. 
accomplish  these  variations,  four  test  locations  are  to  be  used: 


1. 

Oxnard,  Calif. 

Sea  level 

2. 

Edwards,  AFB,  Calif. 

2300  feet 

3- 

Bishop,  Calif. 

5300  feet 

4. 

Coyote  Flats,  Calif. 

9400  feet 

To 


Eleven  flights  per  location  will  be  flown.  The  first  flight  to  check 
MAL  and  engine  performance,  flights  2-10  will  consist  of  three  flights  per 
day;  one  in  early  morning,  one  in  late  morning,  and  one  in  early  evening  for 
3  days.  The  eleventh  flight  will  again  check  engine  degradation  over  this 
period. 

By  use  of  these  locations  it  is  expected  that  a  suitable  variation  in 
all  these  parameters  may  be  achieved. 

Another  hypothesis  to  be  tested  is  that  relative  humidity  will  have 
little  or  no  effect  on  Lift  Margin.  By  use  of  the  modified  air  density 
equations  given  by  the  National  Bureau  of  Standards,  error  models  will  be 
developed  to  show  the  effect  of  air  density  measurements  made  with  and 
without  the  relative  humidity  input  on  the  performance  of  the  Lift  Margin 

c  •  c 

System. 
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V.  hhROR  MODELS  AMD  REGRESSION  EQUATIONS 

It  is  assumed  that  the  data,  the  output  of  the  flight  tests,  will  permit 
of  the  construction  of  regression  equations.  These  equations  will  establish 
(probalistic)  functional  relations  between  Lift  Margin,  air  density  and 
Torque;  and  pressure  (a  subsystem  relationship) .  c 

It  is  also  assumed  that  in  view  of  the  anticipated  complexities  of  the 
system  and  the  difficulties  of  obtaining  independent  estimates  of  the  effects 
of  the  control  variables ,  a  predictive  linear  model  will  yield  the  salient 
characteristics  of  the  behavior  of  the  Lift  Margin  System  response. 

The  selected  independent  variables  will  be  tried  and  tested  in  the 
regression  equations.  Confidence  intervals  about  the  coefficients  will  be 
determined.  This  type  of  model  will  provide  insight  as  to  the  response  of 
the  lift  margin  system;  and  suggest  guidelines  for  more  meaningful 
experimental  design  in  this  area.  The  model  will  test  the  stability  of 
the  parameters  over  the  sample  space,  i.e.  the  operative  range  of  the 
controlled  (and  uncontrolled)  variables. 

1.  Types  of  Linear  Model  c-  c.  .  ■  -  ^  Jt 

a.  E^M)  M  a0  +  ^  Xx  +  a2  x2 

+  a3  x3  +  a4  Xx2 

+  a5  X22  +  a6  XlX3  *  '  /  ■  ...  /  :  V.  r 

+  °7  X!X2  +  a8X2X3 

This  is  a  linear  model,  linear  in  the  parameters  ;  i  =  0,1 . ,8. 

The  highest  power  of  an  independent  variable  is  called  the  order  of  the  model. 
It  is  to  be  noted  that  the  linearity  of  a  model  refers  to  the  linearity  (or 
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non-linearity)  of  the  coefficients. 


The  test  data  will  enable  us  to  obtain  estimates  of  the  true  coefficients 
V  ®i»  •'*••»  “s 
X^  =  Torque 

Xg  *  Air  density  ■  '• 

=  Pressure 

Estimation  theory  will  be  applied  to  obtain  estimates  of  these  coefficients: 
bQ,  b^,  bg,  etc.  with  tolerable  confidence  limits.  The  ultimate  purpose  it 
to  obtain  an  "optimal"  regression  equation. 

ERROR  MODELS 

The  error  models  are  of  significant  impact  in  that  they  will  establish 
criteria  for  measuring  the  performance  of  the  Helicopter  Lift  Margin  system. 

The  error  models  will  also  enable  us  to  compare  the  performance  of  the 
system  with  respect  to  the  standard  reference  provided  by  a  strain  gauge. 

c  c 

Distributions  of  errors  for  various  levels  of  torque,  pressure  (and  altitude) 
will  be  obtained.  Estimates  of  the  means  and  the  respective  standard  errors 
(of  the  means)  will  be  obtained.  -  ■  ' 

The  error  model  comparing  the  effect  of  air  density  as  computed  by  the 
UBS  Air  Density  Equation  (on  the  Helicopter  Lift  Margin  System)  with  that 
computed  by  the  algorithms  of  the  Transonics  Computer  is  of  special  importance. 

It  will  be  used  to  validate  the  equations  developed  by  the  National  Bureau 
of  Standards  (NBS) .  One  of  the  error  models  will  compare  the  air  density 
computed  by  the  NBS  equations  with  that  computed  by  ASTA. 
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SYMBOLOGY 


LM 

MAL 

EGW 

T 

CIT 

P 

P 


Q 

Qms 

Ps 


OGE 

IGE 


Kl 

K2 


k4 

k6 


*7 


Qw 


Po 


Lift  Margin 

Maximum  Available  Lift 
Effective  Gross  Weight 
Temperature 

Compressor  Inlit  Temperature 

Pressure 

Air  Density 

Torque 

Maximum  Standard  Torque 
Static  Pressure  (absolute) 

Out  of  Ground  Effect 

In  Ground  Effect 

1 .083U 

-O.OO556 

O.9469O 

Engine  rpn 

.0003472 

.000352 

6. 493, Kg  225  (Set  1) 

7-639>K8  =  1325  (Set  2) 

Torque  at  weighing  (psi) 

Standard  pressure  29-921  (in  Hg.) 
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BOX  AWARDED  THE  1972  SAMUEL  S.  WILKS  MEMORIAL  MEDAL 


Introductory  Remarks  Made  by  Frank  E.  Grubbs,  Conference  Chairman 


Again,  It  Is  so  nice  for  all  of  us  to  be  together  at  another  Army  Design 
of  Experiments  Banquet,  -rod  it  is  a  pleasure  to  see  all  of  you  once  again 
this  year  for  the  Eighteenth  Conference.  This  is  not  the  first  time  we 
have  had  the  Army  Design  of  E:cperiments  Conference  at  Aberdeen  Proving 
Ground.  In  fact,  our  Sixth  Conference  was  held  here  twelve  years  ago,  and 
Sam  Wilks  was  with  us  then.  As  I  recall,  George  Box,  Churchill  Eisenhart, 
Stu  Hunter,  Boyd  Harshbarger  and  Bill  Cochran  were  present  for  that 
meeting,  but  John  Tukey  was  unable  to  attend,  although  we  are  glad  to  have 
him  back  for  this  year’s  conference. 

Of  all  things,  the  program  for  the  Sixth  Army  Design  of  Experiments 
Conference  at  Aberdeen  Proving  Ground  had  a  10  X  10  Graeco  Latin  Square 
on  its  cover!  Thus,  our  first  conference  at  Aberdeen  Proving  Ground 
occurred  just  a  year  or  so  after  the  so-called  "Euler-Spoilers"  came 
along.  Way  back,  the  great  mathematician,  Euler,  conjectured  that  it  was 
not  possible  in  general  to  construct  Graeco  Latin  Squares  of  even  sizes 
(2n  +  2)  for  the  Greek  and  Latin  letters.  (I  might  say,  parenthetically, 
that  even  though  Tukey,  Box  and  others  present,  are  thinking  about  that 
statement  it  is  of  sufficient  accuracy  for  902  to  952  of  us  present 
anyway!)  In  any  event,  with  high  speed  electronic  computation  capability 
available,  an  attempt  was  made  in  the  late  1950's  to  construct  10  X  10 
Graeco  Latin  Squares  on  a  computer.  A  program  was  set  up  to  generate 
10  X  10  Latin  Squares,  attempting  to  pair  them  up  to  satisfy  the  Graeco 
Latin  Square  condition.  For  several  hundred  hours  of  running  time,  the 
unfortunate  computer  tried  to  "marry"  a  Latin  Square  to  a  Greek  Square 
and  failed  to  do  so.  The  "Euler-Spoilers"  (R.  C.  Bose,  S.  S.  Shrikande, 
and  E.  T.  Parker  of  the  University  of  North  Carolina)  on  hearing  about 
this  computer  failure  proved  with  the  help  of  advanced  group  theory  that 
Euler's  conjecture  was  wrong,  and  if  the  computer  had  been  left  to  run  the 
way  it  was  set  up,  it  might  have  had  a  50:50  chance  of  constructing  a 
10  X  10  Graeco  Latin  Square  in  too  many  years!  Thus,  we  were  just  in  time 
for  the  correct  construction  of  a  10  X  10  Graeco  Latin  Square  and  one 
appears  on  this  cover  of  our  Sixth  Army  Design  of  Experiments  Conference. 
Anyone  who  cares  to  may  check  it  out!  I'll  pass  it  around. 

We  now  turn  to  the  Samuel  S.  Wilks  Memorial  Medal. 

The  Samuel  S.  Wilks  Memorial  Medal  Award,  initiated  jointly  in  1964  by 
the  U.  S.  Army  and  the  American  Statistical  Association,  is  administered 
by  the  American  Statistical  Association,  a  non-profit,  educational  and 
scientific  society  founded  in  1839.  The  Wilks  Award  is  given  each  year 

Preceding  page  blank 
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to  a  statistician  and  is  based  primarily  on  his  contribution  to  the 
advancement  of  scientific  or  technical  knowledge  in  Army  statistics, 
ingenious  application  of  such  knowledge,  or  successful  activity  in  the 
fostering  of  cooperative  scientific  matters  which  coincidentally  benefit 
the  Army,  the  Department  of  Defense,  the  U.  S.  Government,  and  our 
country  in  general. 

The  Awards  consists  of  a  medal,  with  a  profile  of  Professor  Wilks  and  the 
name  of  the  recipient  on  the  reverse,  and  a  citation  and  honorarium  related 
to  the  magnitude  of  the  Award  funds.  The  annual  Army  Design  of  Experiments 
Conferences,  at  which  the  Award  is  given  each  year,  are  sponsored  by  the 
Army  Mathematics  Steering  Committee  on  behalf  of  the  Office  of  the  Chief 
ot  Research  and  Development,  Department  of  the  Army. 

The  funds  for  the  S.  S.  Wilks  Memorial  Medal  Award  were  donated  by  Philip 
G.  Rust,  retired  industrialist,  Thomasville,  Georgia. 

Previous  recipients  of  the  Samuel  S.  Wilks  Memorial  Medal  include  John 
W.  Tukey  of  Princeton  University  (1965),  Major  General  Leslie  E.  Simon 
retired  (1966),  William  G.  Cochran  of  Harvard  University  (1967),  Jerzy  ; 
Neyman  of  the  University  of  California,  Berkeley  (1968) ,  Jack  Youden 
(1969)  retired  from  the  National  Bureau  of  Standards  and  deceased,  George 
W.  Snedecor  (1970)  retired  from  Iowa  State  University,  and  Harold  Dodge 
(1971)  retired  from  Bell  Telephone  Laboratories. 

With  the  approval  of  ASA  President  William  H.  Shaw,  the  1972  Wilks  Medal 
Committee  consisted  ofs 

Professor  Robert  E.  Bechhofer  -  Cornell  University 

Dr.  Fred  Frishman  -  Army  Research  Office,  Washing ton, D.C 

Professor  J.  Stuart  Hunter  -  Princeton  University  t 

Professor  Oscar  Kempthorne  -  Iowa  State  University 

Dr.  Badrig  Kurkjian  -  US  Army  Materiel  Command, 

Washington,  D.  C. 

Professor  Fred  Leone  -  The  University  of  Iowa 

Dr.  William  R.  Pabst,  Jr.  -  Washington,  D.  C. 

Major  General  Leslie  E.  Simon  -  Retired  i 

Dr.  Frank  E.  Grubbs,  Chairman  -  US  /\rmy  Ballistic  Research  Labs 

'  L  Aberdeen  Proving  Ground,  Maryland 

c 

As  many  of  you  conferees  are  aware,  our  process  of  selecting  the  Wilks  : 
Memorial  Medalist  each  year  turns  out  to  be  a  statistically  significant 
event,  having  to  screen  25-30  nominees,  fighting  out  the  basic  purposes  of 
the  Wilks  Medal  (including  what  statistics  have  found  wide  application  to 
government  work  and  are  highly  relevant  to  the  Army),  and  committee 
members  occasionally  exchanging  insults  as  the  situation  demands  I  Again, 
however,  we  certainly  got  the  right  man. 

The  1972  Samuel  S.  Wilks  Memorial  Medalist  is  an  internationally  recognized 
authority  on  statistics  and  has  contributed  greatly  to  the  design  and 
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analysis  of  scientific  experiments.  He  was  born  at  a  place  that  sounds  a 
bit  "redundant",  Gravesend,  Kent,  England,  in  1919.  He  began  his  career 
as  a  Statistician  during  World  Wer  II,  and  completed  his  formal  education 
in  statistics  with  a  Bachelor  of  Science  Degree  (1947)  and  a  PhD  (1952) 
from  the  University  of  London.  Now  as  I  go  along,  the  Wilks  Medalist 
for  1972  should  be  shown  to  be  above  or  go  beyond  the  952  statistical 
level  of  significance,  so  to  speak!  So,  we  will  begin  accumulating  the 
points.  Now  this  man,  during  World  War  II.  amd  as  a  consequence  o':  cis 
cleaver  use  of  experimental  design  in  the  analysis  of  enemy  toxic  materials, 
was  awarded  the  British  Empire  Medal  in  1946.  For  that  we  will  give  him 
15  percentage  points.  After  obtaining  his  PhD  in  Statistics  from  the 
University  of  London,  he  left  to  work  for  Imperial  Chemical  Industries  Ltd., 
where  he  had  the  opportunity  to  come  into  contact  with  real  world  experi¬ 
mentation.  For  that  we  will  give  him  10  percentage  points  and  he  is  up 
to  25.  On  leave  of  absence  from  Imperial  Chemical  Industries,  he  spent 
the  year  52-53  at  the  Institute  of  Statistics,  Raleigh,  North  Carolina, 
and  of  all  things  on  a  research  grant  supported  by  the  Army  Research 
Office  -  Durham  (ARO-D) .  Now  anyone  who  would  take  on  3ome  of  the  statis¬ 
tical  problems  of  the  Army  deserves  special  recognition,  so  we  will  give 
him  20  percentage  points  for  succeeding  at  that.  My  word,  we  are  already 
up  to  45  percentage  points.  While  working  for  ARO-D,  a  famous  expository 
paper  on  the  Explorati  a  and  Exploitation  of  Response  Surfaces  appeared, 
along  with  ideas  of  "robustness"  in  the  analysis  of  variance,  and  also 
those  important  rotatable  designs.  On  his  return  to  Imperial  Chemical 
Industries,  our  1972  Wilks  Medalist  delved  into  statistical  methods  for 
the  elucidation  of  basic  mechanisms  and  advanced  the  concept  of  Evolutionary 
Operations.  In  1957,  he  became  the  Director  of  the  Statistical  Techniques 
Research  Group  at  Princeton  University,  sponsored  also  by  the  Army 
Research  Office,  and  during  the  years  at  "Gauss  House"  at  Princeton  he 
came  to  know  Sam  Wilks  quite  well  and  established  a  vigorous  statistical 
center  there.  Papers  on  design  for  non-linear  models,  simplex  sum  and 
three-level  designs,  the  generation  of  random  normal  deviates,  and  papers 
on  adaptive  optimization  and  rebus tness  to  non-normality  of  regression 
were  completed.  For  all  of  this  work,  most  of  which  was  very  useful  to  the 
Army  and  others  as  well,  he  must  be  given  a  significant  number  of  points, 
anyway,  say  20.  It  seems  we  just  hit  65  percentage  points  and  still 
counting.  In  1960,  our  1972  Wilks  Medalist  left  Princeton  to  become  a 
member  of  the  then  Army  Mathematics  Research  Center  (now  just  the  Mathematics 
Research  Center  since  that  place  was  bombed  out!)  at  the  University  of 
Wisconsin.  At  Wisconsin,  he  immediately  took  the  lead  in  establishing  the 
Department  of  Statistics.  His  research  interests  while  at  Wisconsin  have 
steadily  broadened  and  in  addition  to  further  contributions  to  fractional 
factorials  and  sequential  designs  for  non-linear  models,  he  became  concerned 
with  problems  of  non-linear  estimation,  the  dynamic  control  of  industrial 
processes  and  he  "joined  the  opposition"  in  contributing  the  Bayesian 
methods  (!),  and  finally  parametric  time  series  analysis.  In  fact,  last 
year  our  1972  Wilks  Medalist  was  elevated  to  the  R.  A.  Fisher  Chair  of 
Statistics  at  Wisconsin.  Furthermore,  we  are  reminded  that  back  in 
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1964  he  was  awarded  Che  Guy  Medal  of  Che  Royal  SCaCisCical  Society, 

London.  Now  for  all  Chls  we  musC  give  him  30  polnCs  and  oops  we  hie  Che 
952  level.  BuC  wale  a  minuCe!  In  1968,  our  1972  Wilks  Medalisc  goC  Che 
Shewharc  Medal  of  Che  American  SocieCy  for  QualiCy  ConCrol  for  his 
conCribuCions  chere.  Now  chac  is  a  fine  honor  indeed,  buC  I  am  reminded 
ChaC  a  couple  of  Army  SCadsCicians  goc  ChaC  Coo,  so  we  will  deduce  five 
percentage  poinCs  bringing  him  back  Co  only  90  which  is  noc  quiCe  enough. 

Oh  my,  I  have  nevercheless  forgoCCen  something  imporcanc.  A  former 
summer  sCudenC  employee  of  mine  20  years  ago  here  ac  Aberdeen  Proving  Ground 
had  enough  inherenc  capabilicy  in  sCaClsiics  Co  work  wich  our  Wilks 
MedalisC  during  Che  year  (1952-53)  he  was  Cuming  ouC  all  ChaC  fine  SCaCisCical 
work  ac  Che  InsCiCuCe  of  ScadsCics,  Raleigh,  N.  C.,  for  ARO-Durham. 
FurChermore,  ChaC  former  summer  sCudenC  employee  of  mine,  wich  such  excellenC 
Craining  has  become  a  famous  sCadsCician  in  his  own  righc.  Now  wich  all 
of  chis  good  work  going  on  and  hopefully  as  a  resulc  of  some  well  chosen 
Army  conCacCs  we  muse  add  more  poinCs,  buC  one  can'e  Chrow  in  Coo  many  more 
poinCs,  so  we  will  seCtle  for  a  final  8  poinCs,  and  award  George  Box  Che 
1972  Samuel  S.  Wilks  Memorial  Medal  at  ' 98  percenCage  poinCs!  CongraCulaCions, 
George  Box,  and  I'll  now  call  on  Churchill  Eisenharc,  PasC  President  of 
ASA,  to  present  the  1972  Wilki  Medal. 

GEORGE  E.  P.  BOX  RECEIVES  THE  1972  SAMUEL  S.  WILKS  MEMORIAL  MEDAL 

t 

The  Presentation  of  the  Award  Made  by  Churchill  Eisenhart, 

Past  President  of  the  ASA 

The  following  citation  was  read: 

"To  George  E.  P.  Box,  in  recognition  of  his  many  significant 
contributions  to  experimental  design,  robustness,  Evolutionary 
Operations,  Bayesian  methods  and  time  series  analysis,  and 
for  his  leadership  in  re.'.;  .n;,  theoretical  results  to 

practical  problems." 

•  c 

ACCEPTANCE  REMARKS  OF  GEORGE  E.  P.  BOX  ON  RECEIVING  THE 
SAMUEL  S.  WILKS  MEMORIAL  MEDAL  FOR  1972 

General  Koster  and  fellow  Statisticians:  - 

You  do  me  especial  honor  in  presenting  me  with  an  award  commemorating 
Sam  Wilks.  Wilks  as  you  all  may  know  was  a  wise  and  greatly  loved  man, 
who  also  was  a  distinguished  statistician.  He  made  fundamental  contri¬ 
butions  to  statistical  theory,  but  he  was  also  a  man  who  believed  in 
statistics  as  a  key  to  solving  practical  problems.  This  was  evidenced 
especially  by  his  work  beginning  in  World  War  II  for  the  National  Defense 
Research  Committee,  his  setting  up  and  his  direction  of  the  Princeton 
Statistical  Group,  his  work  on  Quality  Control,  his  originating  of  the 
unique  yearly  Princeton  Conference  and  his  initiating  of  the  Army 
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Conference  on  the  Design  of  Experiments,  the  Eighteenth  of  which  we 
presently  celebrate. 

It  has  sometimes  seemed  to  me  a  great  pity  that  his  very  proper  attitude 
towards  the  complimentary  functions  of  theory  and  practice  has  not  been 
more  widely  understood.  Practice  alone  uses  a  too  little  cook  book  to 
produce  dishes  which  are  o*'.en  stale,  tasteless  and  inappropriate  to 
the  occasion.  Theory  alono  is  directionless  and  can  wander  into  labyrinths 
of  pointless  abstraction.  But  practice  inspiring  new  theory  and  theory 
tested  with  new  practice  can  produce  wonders. 

I  have  sometimes  heard  members  of  the  statistical  profession  avow  in 
voices  that  admired  their  own  liberalism  that  yes!  There  should  be  some 
schools  of  applied  statistics  as  well  as  some  concerned  with  mathematical 
statistics.  The  implication  was  that  there  really  could  be  no  harm  in 
this  so  long  as  they  remained  far  enough  apart! 

It  is  rather  like  trying  to  produce  a  fine  race  of  children  by  encouraging 
the  development  of  a  healthy  group  of  men  and  the  separate  but  equal 
development  of  a  corresponding  group  of  woman,  and  at  the  same  time 
taking  precaution  to  ensure  that  they  never  see  each  other. 

Sam  Wilks  was  very  conscious  of  these  communication  problems  and  it  was 
by  instituting  such  conferences  as  the  present  one  and  by  many  parallel 
efforts,  some  of  which  I  have  already  mentioned  that  he  contributed  to 
their  solution. 

One  of  the  difficulties  that  gets  in  the  way  of  fruitful  interaction 
between  scientific  experimenters  and  statisticians  is  intellectual  arrogance 
The  fault  can  lie  on  either  or  both  sides,  but  I  blush  to  confess  that  most 
often  it  is  the  statistician  who  is  in  error. 

There  are  various  levels  of  knowledge  and  ignorance  which  have  been 
recognised  by  philosophers.  Among  these  are  knowing  that  you  know,  not 
knowing  that  you  know,  knowing  that  you  don't  know,  and  not  knowing  that 
you  don't  know.  The  tragedy  of  the  last  category  is  that  once  you  3re 
in  it  you  remain  in  it, 

'  ;  c 

Perhaps  one  story  of  near  disaster  will  serve  as  illustration.  Twenty 
some  odd  years  ago  I  remember  being  involved  in  the  design  of  an  exper¬ 
iment  to  compare  two  treatments  A  and  B  from  a  batch  chemical  process.  I 
made  up  my  mind  early  on  (too  early  on  as  it  turned  out)  that  what  they 
needed  was  a  standard  paired  design  in  which  batches  consecutive  in  time 
would  constitute  a  pair,  and  the  order  (AB)  or  (BA)  would  be  randomly 
assigned  within  each  pair.  It  was  quite  clear  that  the  chemists  thought 
this  was  a  bad  idea.  Without  further  thought  I  attributed  their  objection 
to  laziness.  I  more  than  hinted  that  they  couldn’t  be  bothered  to  do  the 
job  right!  It  was  only  after  much  argument  that  I  allowed  myself  to  hear 


what  was  the  real  nature  of  their  objections.  The  process  was  one  where 
what  is  called  "carry  over"  occurs.  In  these  circumstances,  the  most 
efficient  design  is  not  a  paired  arrangement  but  one  in  which  several  B's 
follow  several  A's  which  is  what  they  had  proposed! 

The  technical  explanation  is  of  course  that  the  carry  over  phenomenon 
produces  a  time  series  in  which  successive  observations  are  negatively 
correlated.  The  usual  assumption  that  propinquity  produces  high  positive 
correlation  which  leads  to  pairing  breaks  down.  Once  I  understood  this, 

I  was  not  slow  in  explaining  it  to  them.  In  discussion,  I  pointed  out  its 
implications  in  terms  of  spectral  analysis  and  the  Insights  it  provided 
on  randomization  theory.  They  were  very  polite  about  it;  they  said  they 
didn't  know  anything  about  all  that  but  they  seemed  glad  that  I  had  at 
last  seen  sense. 

I  accept  this  medal  aware  of  my  luck  in  having  had  patient  scientific 
colleagues  and  acknowledge  the  debt  I  owe  them  for  taking  the  time  to 
educate  a  statistician.  I  know  I  would  have  learned  more  if  I  had 
listened  better. 


AUTOMATED  RADAR  DATA  PROCESSING  AT 
WHITE  SANDS  MISSILE  RANGE  FEATURING 
ADAPTIVE  FILTERING  WITH  BIAS  ESTIMATION 


W.A.  McCool 

Analysis  and  Computation  Division 
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ABSTRACT,  This  paper  describes  the  automated  system  for  radar  data 
processing  now  being  implemented  at  White  Sands  Missile  Range.  Such  a 
system  will  significantly  reduce  the  average  delivery  time  of  the  data 
reports  containing  the  processed  data  and,  in  priority  situations,  will 
provide  such  data  reports  essentially  on-line,  i.e.  within  minutes  after 
mission  completion.  Furthermore,  the  number  of  instrumentation  radars 
generating  data  to  be  processed  has  been  steadily  Increasing  as  a  result 
of  Range  modernization  and  special  radar  projects  (AMRAD,  RAMPART,  and 
others).  The  automation  has  become  feasible  with  the  availability  of: 

(1)  large-scale  third  generation  computing  facilities  (three  UNIVAC  1108 
systems);  (2)  the  capability  to  record  all  radar  data  at  the  central 
computing  site;  and  (3)  software  containing  new  data  editing  techniques 
and  adaptive  digital  filters  which  are  based  on  Kalman  filter  concepts 
and,  at  the  same  time,  are  computationally  efficient.  Feasibility  of  the 
automated  system  has  already  been  demonstrated  and  its  Implementation 
has  now  advanced  to  the  point  of  developing  operating  procedures. 

INTRODUCTION.  As  Its  major  mission,  the  White  Sands  Missile  Range 
(WSMR)  provides  instrumentation,  air-space,  and  supporting  facilities 
for  testing  missile  weapon  systems  and  conducting  scientific  experiments, 
e,g.  the  Army's  PERSHING  missile  system,  the  Air  Force's  ATHENA  missile 
system  for  re-entry  research,  and  the  Air  Force's  Project  621B  for  proving 
out  a  new  satellite  navigation  system.  This  mission  requires  a  total  of 
18  precise  instrumentation  radars  for  tracking  missiles,  satellites,  air¬ 
craft,  bombs,  balloons,  parachutes,  and  any  other  flying  objects  and 
generating  corresponding  trajectory  (metric)  data.  During  the  past  year 
WSMR  radars,  sampling  observed  target  positions  20  times  every  flight  second, 
generated  more  than  250,000,000  space-point  measurements.  The  expeditious 
processing  of  this  huge  ‘lime  of  radar  data  is  critical  to  acceptable 
project  support.  At  th  resent  time,  the  average  delivery  time  of  the 
standard  post-flight  reports  containing  the  processed  metric  data  is  about 
ten  days.  "Quick-look"  radar  data  reports,  with  a  limited  number  of 
processed  metric  parameters,  are  usually  delivered  in  less  than  one  working 
day  after  mission  completion.  The  major  objective  of  the  automated  radar 
processing  system  Is  to  reduce  the  average  report  delivery  time  by  an 
order  of  magnitude  and,  in  priority  situations,  to  an  on-line  situation. 
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WSMR  DATA  PROCESSING  FACILITIES.  The  Department  of  Army  has 
provided  WSMR  with  large-scale  third-generation  multi-processing, 
digital  computing  facilities  consisting  of  two  systems  (denoted  as 
the  A  and  B  systems)  each  containing  two  UNIVAC  1108  main-frames  and 
another  system  (C)  containing  a  single  1108  main-frame.  The  A  system 
is  devoted  primarily  to  real-time  mission  support,  the  B  system  to 
remote  terminal  support  and  normal  batch  processing  of  unclassified 
data,  and  the  C  system  to  batch  processing  of  classified  data.  The 
automated  processing  of  radar  data  will  be  handled  as  "background" 
workload  in  the  A  multi-processing  system.  There  is  a  possibility  that 
at  some  future  time,  part  or  all  of  the  automated  processing  will  be 
transferred  to  the  C  system.  At  the  present  time  the  WSMR  UNIVAC  com¬ 
puters  are:  (1)  processing  about  16,000  jobs  per  month  producing  more 
th^n  100,000,000  lines  of  listing;  (2)  supporting  about  25  real-time 
missions  per  month;  (3)  producing  between  800  and  1200  data  reports 
of  all  types  per  month;  and  (4)  supporting  about  30  remote  terminals 
(a  total  of  66  terminals  is  planned). 

Figure  1  depicts  the  basic  functions  comprising  the  "Real- 
Time  Data  System"  (RTDS)  at  WSMR  which  was  instituted  on  a  very 
small  scale  in  1961  and  through  the  Range  modernization  program 
has  become  a  highly  sophisticated  system  for  satisfying  current 
and  future  real-time  mission  support  requirements.  In  addition  to 
the  UNIVAC  computer  A  system  with  its  real-time  interfaces,  the 
RTDS  has  a  very  flexible  and  reliable  data  communication  sub¬ 
system  which  transmits  data  from  39  sensors,  including  radars,  to 
the  real-time  computer  via  a  Data  Control  subsystem  and  from  the 
computer  to  44  sensors  requiring  acquisition  (pointing)  data.  The 
Data  Control  subsystem  includes  a  central  recording  facility  which 
is  capable  of  simultaneously  recording/playing-back  the  data  from 
all  the  transmitting  sensors  as  well  ns  from  the  computer  when  it 
is  generating  acquisition  data  to  all  the  receiving  sensors.  All 
WSMR  instrumentation  radars,  when  so  scheduled,  transmit  their 
data  In  real-time  to  the  central  recording  facility.  The  play-back 
capability  not  only  provides  a  permanent  mission  data  record  but 
also  is  an  essential  component  of  the  automated  radar  data 
processing  system  because,  in  general,  the  processing  will  not  be 
accomplished  during  missions.  Figure  2  indicates  how  data  are 
transmitted  between  a  sensor  and  the  central  data  control  site. 
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FIGURE  1 


FIGURE  2 


The  modulation-demodulation  process  is  employed  to  allow  sensor 
data  to  be  efficiently  and  reliably  two-way  transmitted  over 
relatively  inexpensive  first-class  telephone  lines  without 
distance  limitation.  The  transrai ttinp, -receiving  devices  which 
are  called  MODEMS,  one  at  each  end  of  the  line,  utilize  tone- 
modulated  "carriers'*.  Recording  these  carriers  with  an  instru¬ 
mentation  recorder,  similar  to  a  home  type  recorder,  is  an 
extremely  simple  and  reliable  technique  (called  analog  recording) 
which  has  been  standard  practice  at  WSMR  since  1964.  On  playback, 
each  carrier  is  demodulated  by  a  MODEM  to  reconstruct  the  original 
digital  data,  generated  by  the  sensor,  to  be  received  by  the 
real-time  computer. 

The  standard  vendor-supplied  executive  program  for  the  UNIVAC 
1108  computer  is  called  EXEC-8  which  has  been  augmented  at  WSMR 
for  efficient  real-time  data  processing.  This  augmented  EXEC-3 
along  with  specific  real-time  applications  programs  can  support 
six  missions  concurrently  in  a  wide  variety  of  mission  tynes, 
being  limited  primarily  by  the  number  of  available  input/output 
data  channels  from  and  to  Range  sensors.  The  average  number  of 
concurrent  missions  requiring  real-time  support  at  any  given  time 
during  a  typical  Range  day  for  the  next  several  years  will  be  no 
more  than  two.  This  means  that  plenty  of  computing  capacity  will 
be  available  for  back-ground  processing  required  by  the  automated 
ay 8 tern. 

Real-time  data  processing  utilizing  a  large-scale  digital 
computer  began  at  WSMR  in  1962.  Because  of  the  computing  tine 
constraints  inherent  in  this  type  of  data  processing,  there  has 
been  a  continuing  evolution  since  that  time  in  the  development  of 
more  efficient  data  handling,  editing,  selection,  and  filtering 
techniques  as  well  as  a  wide  variety  of  mathematical  procedures, 
algebraic  equations,  computation  of  Kepplerian  orbital  trajectories 
for  instantaneous  impact  prediction,  coordinate  transformations, 
and  command  generation. 

It  should  be  clear  from  the  foregoing  discourse  that  WSMR 
now  has  the  necessary  facilities  to  automate  its  radar  data 
processing,  i.e.  the  third  generation  digital  1108  computer  system 
with  real-time  interfaces,  the  Real-Time  Data  System  with  a 
versatile  central  recording  capability,  complete  operational  soft¬ 
ware  for  supporting  real-time  multi-processing,  and  a  vast  exper¬ 
ience  and  "know-how"  in  real-time  data  processing  techniques 
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and  procedures.  This  picture  would  not  be  complete  without 
clearly  emphasizing  an  operational  constraint  around  which 
this  automated  system  must  be  designed:  conduct  of  every  Range 
mission  has  unqualified  priority  over  all  post-flight  automated 
system  requirements.  More  specifically,  there  are  two  areas  in 
which  this  constraint  will  occasionally  arise;  suspension  of 
background  computing  during  real-time  mission  support  and  data 
communication  problems  which  would  preclude  real-time  recording 
but  not  mission  support.  The  Impact  of  the  first  situation 
would  merely  be  a  short,  insignificant  delay;  the  impact  of  the 
second  situation  would  incur  the  delays  involved  in  physically 
transporting  radar  data  tapes  recorded  on-site  to  the  Data 
Control  area  in  the  Range  Control  Center. 

DESIGN  OBJECTIVES  OF  THE  AUTOMATED  SYSTEM.  The  design  of  the 
automated  system  for  processing  radar  data  has  four  major  objectives. 

1.  An  average  delivery  time  of  one  working  day  after 
mission  completion  for  all  standard  data  reports 

Achievement  of  this  objective  will  effectively  combine  the  post- 
flight  and  quick-look  types  of  reports  now  being  provided  to 
the  Range  users  in  ten  and  one  days,  respectively.  Priority 
delivery  may  be  provided  with  "on-line"  processing,  the  reports  being 
computer  listings  generated  almost  immediately  after  missions. 

Reports  requiring  precise  plots  will  be  delivered  in  two  days  until 
an  on-line  high-speed  plotter  is  acquired.  A  standard  data  report  limits 
the  user  to  prescribed  options  which  have  been  established  by  frequency 
of  use.  Delivery  of  non-standard  reports,  i.e.  those  having  unique 
requirements,  must  be  considered  on  a  case-by-case  basis.  This  objective 
is  motivated  by  the  fact  that,  in  many  cases,  the  value  of  missile  test 
data  to  the  user  decreases  rapidly  with  every  passing  day  after  a  mission. 
It  should  be  clear  that  this  objective  could  not  be  achieved  without 
central  recording  and  real-time  data  processing  capabilities. 

2.  Dse  of  Kalman*»5  filter  concepts  and  real-time 
editing  techniques  to  optimize  the  quality  of 
the  processed  radar  data  and  the  computing 
efficiency  of  the  processing. 

This  objective  has  already  been  accomplished  with  the  proven  editing 
and  filtering  techniques  described  later  on  in  this  paper.  Historically, 
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radar  data  at  WSMR  has  been  processed  using  typical  classical  procedures 
(a.g.  moving-arc  least-squares  smoothing)  which  are  computationally  In¬ 
efficient  and  less  effective  compared  to  the  performance  of  some  of  the 
more  modern  recurslves  methods,  such  as  the  QD  digital  filter. The 
latter,  for  example.  Is  employed  In  all  the  programs  which  have  been 
generating  quick-look  radar  reports. 


3.  Providing  multi-station  trajectory  and  bias  estimates 
and  all  required  types  of  error  estimates  on  a  point- 
by-point  basis. 

These  estimates  are  Included  in  the  automated  processing  computer  program 
with  very  little  expense  in  computing  time.  In  the  past,  multi-station 
trajectory  estimates  (sometimes  called  an  N-station  solution)  have  been 
provided  with  an  average  delivery  time  of  20  days;  bias  estimates,  based 
on  radar  data  only,  have  not  been  provided  at  all;  and  the  variety  of 
error  estimates  has  been  limited  by  computing  time  to  a  few  available 
types.  Quick-look  data  reports,  for  example,  have  never  provided  any 
error  estimates. 


4.  Simplification  and  standardization  of  data  report 
formats. 

Current  data  formats  have  been  established  by  user  specification  and 
historical  development.  In  the  automated  system,  the  formats  of  the  data 
listing,  the  explanatory  information,  ar.d  plots  are  being  standardized 
with  a  variety  of  available  options. 

FUNCTIONAL  DESCRIPTION  OF  THE  AUTOMATED  RADAR  DATA  PROCESSING  SYSTEM. 

The  essential  functions  of  the  automated  system  are  shown  in  the  data  flow 
diagram.  Figure  3.  The  system  naturally  divides  into  two  areas.  Data  Con¬ 
trol  and  the  UNIVAC  1108-A  computer.  Data  Control  accepts  the  radar  data 
from  the  Range  data  communication  network  and  other  pertinent  data;  the 
computer  generates  the  radar  data  reports. 

The  radar  data  entering  Data  Control  is  contained  in  the  modulation  of 
the  modem  carriers,  as  indicated  by  the  broken  lines  in  Figure  3.  The 
data  in  this  analog  form  are  recorded  along  with  timing  data  and  simul¬ 
taneously  transmitted  to  the  modem  receivers  for  demodulation  into  digital 
form.  Transmission  to  the  modem  rece"ivers  is  often  delayed  with  a  playback 
of  the  recorded  data.  At  this  point  the  digital  radar  data  are  monitored 
by  Data  Control  personnel  to  evaluate  data  quality  and  equipment  performances. 
One  of  the  major  functions  of  the  Special  Interface  equipment  is  to  simul¬ 
taneously  accept,  reformat,  and  store  the  digital  data  from  as  many  as  32 
radars  (or  other  sensors)  and  transmit  the  data  to  the  computer.  Prior 
to  accepting  radar  data  during  a  mission  or  a  play-back  in  Data  Control, 
radar  calibration  data  (i.e.  corrections  for  tilt,  beam  misalignment,  etc.), 
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radar  meteorological  data  for  refraction  corrections,  radar  identification 
data,  and  program  and  report  parameter  options  must  be  fed  into  the  SIE  by 
a  system  controller  via  a  CRT  terminal  (U-10G)  keyboard  and/or  punch-cards 
(1004  card  reader-punch-printer). 

When  the  radar  data  and  the  pre-processing  information  are  accepted 
by  the  computer  from  the  SIE,  they  are  recorded  on  a  log  tape  (UNILCG) 
and/or  magnetic  drum  mass  storage  (FASTRAND)  under  control  of  RTEXEC,  the 
real-time  monitor.  In  other  words,  the  data  logging  is  a  real-time  process 
After  logging  is  completed,  a  batch  job  with  high  priority  is  automatically 
initiated  to  process  the  logged  data  and  generate  a  data  report.  As  in¬ 
dicated  in  Figure  3,  the  processing  consists  of  three  phases  under  the 
control  of  a  Driver  program  (in  turn  controlled  by  EXEC-8,  cf  course); 
QDKMST  (which  is  described  in  the  later  chapter)  concurrently  edits  and 
filters  the  data  from  several  radar3  and  generates  derivative  parameters 
and  error  estimates;  "further  processing"  software  consists  of  a  variety 
of  optional  sub-programs  to  generate  ancillary  parameters  as  required  by 
the  Range  users;  report  generation  consists  of  conversion  of  parameters  to 
specified  engineering  units,  setting  up  data  report  formats,  paging,  etc.; 
and  interfacing  with  one  or  more  of  the  selected  output  devices,  line 
printer,  microfilm  printer,  cathode  ray  tube  (CRT)  display,  or  magnetic 
tape  transports. 

From  the  functional  description,  it  should  be  clear  that  generating 
a  data  report  for  a  single  Range  mission  involving  several  radars  is  a 
straightforward  process.  The  question  may  be  raised,  however,  as  to 
the  effectiveness  of  the  automated  system  under  a  heavy  workload.  In 
response  to  this  question,  it  should  be  emphasized  that  the  automated 
system  may  be  viewed  as  a  "next  generation"  version  of  the  quick-look 
system  which  13  now  generating  about  half  of  all  WSMR  radar  data  reports. 

,  '•  c 

As  noted  before,  the  average  delivery  time  for  quick-look,  radar 
data  reports  is  one  working  day  after  mission  completion  and  this  has 
been  accomplished  with  an  IBM  709411/7044  Direct  Couple  System  (DCS) 
whose  job  stream  execution  is  sequential,  i.e.  one  job  at  a  time. 

In  contrast,  the  automated  system  has  a  dual  1108  multi-processor 
(A  system)  which  executes  six  jobs  and  supervises  all  input-output 
activity  concurrently.  Even  though  System  A  is  dedicated  to  real-time 
mission  support,  it  has  more  than  ample  capacity  to  satisfy  the  automated 
system's  computing  requirements.  Thus,  the  major  bottleneck  will  be 
the  manual  time  and  effort  to  set  up  each  job  in  accordance  with  user 
requirements  and  options  and  collecting  and  Inserting  pre-processing  data. 


-281- 


I 


A, 


V 


REVIEW  OF  THE  QDK7  FILTER/SMOOTHER  THEORY.  There  has  been  a 
long-standing,  urgent  need  for  a  process  to  filter  radar  data  which 
would  be  significantly  superior  to  those  currently  in  use  with 
respect  to  computing  speed  and  filtering  effectiveness.  It  was 
recognized  that  such  a  process  would  have  to  adaptively  effect  an 
optimum  trade-off  between  noise  suppression  (attenuation)  and  best 
fit  (minimum  distortion  of  the  true  signal)  in  accordance  with  the 
characteristics  of  the  noise  content  and  the  kinematics  of  the 
input  data.  It  was  also  recognized  that  the  increased  computing  speed 
constraint  would  almost  surely  dictate  the  use  of  a  recursive  procedure. 
The  QDK  filter?  was  developed  to  satisfy  this  need.  The  basic  QDK 
structure  is  identical  to  that  of  the  discrete  Dalman  filter  but  its 
detailed  formulation  is  greatly  simplified  using  the  QD  filter  theory 
(hence,  the  QDK  acronym)  for  rapid  computation  without  seriously 
degrading  the  Kalman  optimality.  In  order  to  indicate  how  these 
simplifications  are  applied  we  begin  with  the  computational  step- 
by-step  formulas  for  the  discrete  Kalman  filter. 


$x  +  Tu 

-  state  prediction 

(1) 

♦P#*  +  VQY* 

*  state  covariance 
prediction 

(2) 

PHt(HPHt  +  R)"1 

-  weighting  matrix 

(3) 

x  +  W(z  -  Hx) 

-  state  estimation 

(4) 

(I  -  WH)P 

-  state  covariance 
estimation 

(5) 
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where: 


x  -  predicted  state  vector 


x  -  current  estimated  state  vector 


x  -  precedin';  estimated  state  vector  / 


/  ! 

/ 

■  •  / 

z  -  input  data  vector 


$  -  transition  matrix 


T  -  control  coupling  matrix 


Q  -  uncertainty  matrix  of  the  plant-process  model 


f  -  model  uncertainty  coupling  matrix 


Hv  -  input/state  Jacobian 


p  -  predicted  covariance  matrix  of  the  plant-process  model 


current  estimated  model  covariance  matrix 


f  -  preceding  estimated  model  covariance  matrix 
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'  The  data  from  instrumentation  radars  at  VSMR  are  generated  at 
20  samples  per  second  in  spherical  coordinates,  i.e.  rar-e  O'.)  , 
azimuth  (A),  and  elevation  (F.)  in  the  local  radar  reference  systems. 
Each  system  is  established  by  the  local  tangent  nlane  with  zero 
azimuth  being  true  or  arid  north.  It  is  common  practice  to  trans¬ 
form  these  data,  prior  to  filtering  into  an  arbitrarily  located  and 
oriented  Cartesian  coordinate  svstcn  with  components  z^,  zh ,  and  z^. 
For  Kalman  filterin';  the  z^,  z,  ,  data  from  a  single  radar,  the 
dimensions  of  the  vectors  and  matrices  in  the  computational  formulas 
(1)  through  (5)  are  identified  as  follows: 


x,x,x 

-  6x1 

<> 

6x6 

u 

-  6x1 

r 

-  6x6 

P>,P 

6x6 

Q 

-  6x6 

Y 

6x6 

H 

-  3x6 

R 

-  3x3 

I 

-  6x6  (unit) 

W 

6x3 

‘  z 

-  3x1 

The  elements  each  of  the  state  vectors  x,  x,  and  x  are  free 
position  and  three  velocity  components  while  the  elements  of  the 
z  vector  are  the  three  filter  "input:,”.  The  u  vector  and  the 
matrices  are  correspondingly  dimensioned.  It  is  clear  from  these 
dimensional  data  that  the  straightforward  computations,  including 
the  matrix  inversion,  involved  in  using  the  Kalman  formulas  (1) 
through  (5)  for  each  computing  step  would  be  prohibitive,  particu¬ 
larly  when  the  plant  process  parameters  in  <>,  P,  and  V,  as  well  as 
in  H  are  time-varying. 

The  first  simplifications  in  our  application  of  Kalman  filtering 
to  radar  data  segment  the  basic  formulas  into  three  independent 
filters,  one  corresponding  to  each  Cartesian  component.  To  achieve 
these  simplifications,  the  following  reasonable  assumptions  and 
approximations  are  made: 
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(1)  The  plant  process  Is  the  rnda~  target  whose  mathematical 
model  represents  a  point  mass  with  acceleration  components  being 
independent  and  slowly  varying  witn  time.  Thus,  each  acceleration 
component  is  assumed  to  be  constant  during  any  Integration  interval. 
This  assumption  effects  inuenendencc  of  the  component  models  so  that 
all  of  the  model  matrices  and  vectors  of  the  basic  Kalman  formulas 
can  be  suitably  partitioned  and  separated. 

(2)  The  R  matrix,  which  describes  the  noise  characteristics 
of  the  input  data  components  z  ,  z(>,  and  z  has  non-zero  cross¬ 
correlation  elements  due  to  the  non-linear5tv  of  the  R.A.E  to  z  , 
z^,  coordinate  transforn'ation.  In  practice,  these  elements  are 
so  small  that  the  variances  of  the  input  data  components  can  be 
assumed  to  be  independent,  i.e.  R  is  a  diagonal  matrix. 

(3)  The  foregoing  assumptions  and  approximations  allow  parti¬ 
tioning  of  all  the  vectors  and  matrices  so  that  each  component  of 
input  data  can  be  independently  Kalman  filtered.  In  each  of  these 
three  filters  it  is  convenient  to  augment  the  0  matrix,  which  is  the 
same  for  each  filter,  increasing  its  dimensions  from  2x2  to  3x3  in 
order  to  eliminate  the  u  vector  and  the  r  matrix. 

With  the  above  simplifications  the  dimensions  of  the  component 
Kalman  filters  are: 


x,x,x 

-  3x1 

4> 

-  3x3 

c 

p.p.p 

-  3x3 

Q 

-  3x3 

R 

-  lxl 

H 

-  1x3 

W 

-  3x1 

z 

-  lxl 
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x,  x,  P,  P  are  analogous  to  (6)  and  (7),  respectively.  If  Taylor 
series  integration  is  used  to  generate  the  augmented  $  matrix  from 
the  model  differential  equation,  i.e. 


x  ■  x(t)  “  constant, 

then 


where 

h  *  At  •  sampling  inverval,  .05  sec. 

Also, 


(13) 
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When  (7),  (10),  and  (12)  arc  substituted  into  (3),  It  reduces  to  the 
simple  formula. 


(14) 


In  which  the  matrix  Inversion  in  (3)  becomes  a  scalar  division 
operation. 

A  further  dramatic  reduction  in  comnutatiors  is  achieved  by 
appl vinp  the  no  filter  theory  to  formulas  (2)  and  (5)  of  the  component 
Kalman  filters.  This  is  the  fir  1  step  in  the  evolution  of  the  npK 
formulas  from  the  basic  Kalman  formulas  (1)  through  (5).  Starting 
with  (1),  in  which  T  and  u  have  been  eliminated  in  the  component 
filler,  we  have  the  simplified  state  prediction  formula: 

x  ■  4>x,  (15) 

in  which  <!■  is  piven  by  (9)  and  x  and  x  by  (f>) .  Thus,  (15)  and  (4)  are 
the  second-order  OD  filter  formulas  if,  in  (4)  and  (13), 


60M2 

"  10M3  +  33I!2  +  23M  -  6  ’ 


w2  “  2wj/h, 

W3  •  2wj/h2  -  w2/h, 

in  which  the  development  of  wj  in  (16)  is  based  on  a  second-deprec 
curve  fittinp.  an  arbitrary  spar  of  the  last  M  innut  data  values  in  a 
constrained  least-snuares  sense.  The  intercept  and  slope  of  the  fitted 
curve  are  constrained  at  the  time  point  corresponding  to  the  last  input 
data  value  implicitly  drooped  from  t'^e  span.  Usin'?  the  rip  theorv, 
approximate  functional  relationships  amonp  the  elements  of  the  P  matrix 
have  been  developed  and  applied  to  formula  (2)  to  obtain  the  scalar 
recursive  formulas: 


(16) 

(17) 

(18) 


-287- 


*v 


n 


-  8mpu  +  q. 


(19) 


>21  ■  2a(63Pii+q), 


(20) 


31 


2a2(S2Pii  +  q). 


(21) 


where: 


1 

Mh* 


(22) 


1  +  M* 


(23) 


and  q  la  the  model  uncertainty  scalar  rorresoonding  to  0  In  (2). 

(19),  (20),  and  (21)  are  required  in  (14)  to  compute  the.  elements 
of  the  ODK  weightier:  vector.  The  value  of  M,  the  equivalent  0D  filter 
span,  In  (22)  and  (23)  Is  retained  from  the  nrcredinq  computing  step  In 
which  H  is  estimated  with  a  polynomial  Inversion  of  (16)  for  the 
corresponding  computed  value  of  v/] ,  l.e. 


■  aj  +  &2  wl  +  *3wi 2  +  ai,wi3. 


(24) 


The  value  of  pu  in  (19),  (20),  and  (21)  is  retained  from  the  preceding 
computing  step  as  p^  which  is  estimated  with  the  scalar  formula 


(1  -  wi)  pu 


derived  by  substituting  (7),  (12),  and  (13)  into  (5). 


(25) 


From  the  foregoing  exnosition  we  can  now  summarize  the  ODK 
computational  formulas.  The  first  three  are  the  nrediction  formulas 
obtained  by  using  (6)  and  (9)  in  (15).  The  estimation  formulas  (32), 
(33),  and  (34)  are  obtained  by  using  (6),  (12),  and  (13)  in  (4). 
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N 

C 

■  4 

w 

“x  m 

A  A 

A 

_  T  2“ 

*  +  h*  +  h  x 

.  2 

(26) 

• 

X  m 

x  +  h* 

r-  A 

(27) 

X 

•« 

X 

(28) 

o  - 

1/ (Mh)  , 

(22) 

8  - 

1  +  1/M, 

(23) 

Pll  " 

4 

6  pii  +  q, 

(19) 

P21  “ 

3 

2 a  (8  pn  +  q) 

(20) 

P31  " 

2  2 

2a  (8  Pi i  +  q) , 

(21) 

W1  - 

Pll/<Pll  +  r>» 

(29) 

w2  * 

P2i/(pu  +  r). 

■\  '  <  c  c  c;; 

(30) 

0 

w3  " 

P3l/(Fn  +  r). 

(31) 

A 

X  m 

7  +  wj(z  -  7) , 

♦  (32) 

t 

X  • 

*  +  w2(z  -  7) , 

r  t 

.  03) 

X  * 

x  +  w3(z  -  T) , 

(34) 

M  - 

2  3 

al  +  a2wx  +  a3W!  +  a4Wj 

(24) 

A 

Pll  ’ 

(1  -  W^jT. 

(25) 

s 


% 


-289- 


The  QDi:  formulation  for  independently  filtering  each  data 
component  is  simply  extended  to  the  multi-station  formulation. 

To  do  this,  the  inverse  variance  weighted  average  for  each  of  the 
Cartesian  components  in  a  common  reference  system  is  used  as  the 
input  to  the  single  component  ODK  filter.  These  weignted  averages 
z^  for  N  radars  are  corputed  with 


■  (1  •.»)/! 


in  which  the  z^  are  the  corresponding  component  bias-free  inputs 
with  their  respective  variances  r..  In  practice,  the  z^  are  not 
bias-free.  If  are  the  bias  estimates  then  (35)  is  modified  as 

*»  '  (X'W^/X  "V  <36) 


Thus,  the  QDK  multi-station  formulation  must  include  bias  estimation 
procedures.  In  the  development  of  the  bias  estimation  formulas, 
which  are  also  based  on  Kalman  and  QD  theories,  the  biases  are 
assumed  to  be  slowly  varying  and  independent  of  each  other  as  well 
as  the  multi-station  estimation.  The  bias  computation  formulas  are: 


Pi  +  ^i»  *  “  1*2 » - N 


where : 


pi/(Pi  +  r±) , 

Fi  +  w±  -  zb  -  Fi) , 


(1  -  wx)  pi# 


F^»  P^t  Pj  “  bias  model  variance  scalars, 


-  model  uncertainty  scalars, 


.s> 


1— s. 


•v 


k  -  q,  weighting  factor, 

w^  -  bias  weighting  coefficients 

*.  -  arbitrary  bias  reference. 


z,  may  be  optical  data  components  known  to  be  essentially  bias  free, 

0 

the  component  inputs  of  one  particular  radar,  or  the  unweighted 
average  of  the 


“  |  l  *!• 

i-1  1 


(41) 
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THE  QDKMST  PROGRAM  FOR  THE  AUTOMATED  RADAR  DATA  PROCESSING 
SYSTEM.  QDKMST  employs  the  QDK  filter  described  in  the  preceding 
chapter  and  is  modularized  into  ten  subroutines  one  of  which  con¬ 
tains  three  QDK  f ilter/smoothers  for  each  set  of  radar  data. 

QDKMST  includes  pre-execution,  initializing,  and  data  subprograms. 

In  addition  to  concurrently  generated  trajectory  and  deriva¬ 
tive  estimates  for  each  set  of  radar  data,  QDKMST  also  optionally 
provides  corresponding  multi-station  estimates  along  with  relative 
trajectory  bias  estimates  at  the  R,  A,  E  level  for  each  set  of 
radar  data.  An  alternate  option  consists  of  estimates  selected, 
on  a  point-by-point  basis,  from  those  of  the  "minimum  total 
variance"  radar  whose  data  have  been  bias  corrected. 

QDKMST  employs  a  total  of  354  zero,  first,  and  second  order  QD 
fllters3,4,5  in  addition  to  the  33  QDK  f ilter/smoothers  to  generate 
the  estimates  for  ten  sets  of  radar  data  and  a  multi-station  solution 

Execution  time  per  set  of  observed  data  per  radar  for  QDKMST 
on  any  of  the  UNIVAC  1108  computer  systems  is  less  than  4.0 
milliseconds.  For  ten  sets  of  radar  data  the  multi-station  estimates 
per  sample  time  require  approximately  15  milliseconds.  If  the 
radar  data  are  edited  at  the  R,  A,  E  level  at  20  samples  per  second 
and  then  averaged  down  to  five  samples  per  second  prior  to 
transforming  the  data  to  the  common  Cartesian  coordinate  system, 
then  the  data  from  ten  radars  can  be  reduced,  including  the  multi¬ 
station  or  minimum  variance  estimates,  in  less  than  40  milliseconds, 
allowing  160  milliseconds  for  data  handling,  executive  overhead, 
other  batch  processing,  and  a  reasonable  safety  margin  within  the 
200  millisecond  on-line  execution  period. 

QDKMST  contains  the  same  radar  error  model  for  calibration 
corrections  as  that  described  in  the  WSMR  Data  Reduction 
Handbook.2.  The  same  refraction  index  computation  is  also  used. 

The  refraction  correction,  however,  with  spherical  earth  geometry 
is  the  same  as  that  used  in  the  real-time  programs  which  support 
ATHENA  and  PERSHING. 


QDKMST  icself  is  Che  calling  rbucine  for  each  of  lea  sub¬ 
routines  whose  functions  are  outlined  as  follows.  The  FORTRAN 
source  program  Includes  definitions  of  all  the  parameters  and 
variables  of  the  COMMON  statements. 

INTQDK 

a.  Initializes  QDKMST  variables. 

EDIT 

a.  Accepts  range,  azimuth,  and  elevation  observations  from 
each  tracking  radar; 

b.  Initializes  QD  editing  filters  v;hen  several  consecutive 
4th  differences  of  the  K,  A,  E  data  are  less  than  some  limit; 

c.  After  initialization,  the  QD  editing  filters  test  for 
spikes,  and  when  spikes  are  detected,  substitute  predicted  filter 
values ; 

d.  Reinitializes  the  editing  process  after  total  data  drop¬ 
outs; 

e.  Estimates  variances  of  edited  R,  A,  E  data; 

f.  Generates  acceptable  data  flags  for  each  set  of  radar  data; 

and 

| 

I  • 

g.  Optionally  averages  edited  data  from  20  samples  per  second 
down  to  five  samples  per  second.  ) 

I 

RCXFRM  | 

a.  Corrects  edited  range  and  elevation  data  for  refraction  and 
applies  calibration  corrections;  and 

b.  Transforms  edited  R,  A,  E  data  for  each  radar  to  an  arbi¬ 
trarily  selected  common  Cartesian  reference  system. 

MSxBS  (Optional) 

a.  Estimates  R,  A,  E  biases  for  each  radar  relative  to  one  of 
three  references,  optionally  selected; 
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b.  Generates  a  variance  weighted  average  of  available  sets 
of  bias  corrected  data  in  the  common  Cartesian  coordinate  system, 

and 

c.  Identifies  the  minimum  total  variance  radar. 

QDK 

a.  Generates  smoothed  trajectory  and  derivative  estimates 
for  each  set  of  radar  data  and  the  multi-station  set  of  data 
(optional) ; 

b.  Estimates  variances  of  QDK  Cartesian  coordinate  input 
data  and  generates  point-by-point  scalar  quantities  corresponding 
to  the  Kalman  R,  Q,  and  P  matrices;  and 

c.  As  an  alternate  to  the  multi-station  estimates,  optionally 
selects  trajectory  and  derivative  estimates  from  the  minimum  total 
variance  radar  on  a  point-by-point  basis, 

STAT 

a.  Generates  error  estimates  (variances)  of  the  QDK  smoothed 
Cartesian  component  positions,  velocities,  and  accelerations. 

b.  Estimates  the  means  of  the  Cartesian  component  residuals  as 
a  measure  of  smoothing  distortion. 

c.  QDK  noise  attenuation  factors. 

c  < 

d.  Generates  the  means  of  the  R,  A,  E  residuals  and  bias  esti¬ 
mates. 

e.  All  estimates  are  generated  on  a  point-by-point  basis. 

BLOCK  DATA 

Contains  all  of  the  input  parameters  and  constants  which  are 
compiled  into  QDKMST. 

PREMIS 

a.  Computes  a  variety  of  parameters  and  constants  not  contained 
in  BLOCK  DATA; 
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b.  Computes  QD  filter  coefficients; 

c.  Computes  parameters  for  refraction  corrections. 

INTCAL  (called  by  PREMIS) 

a.  Computes  parameters  for  calibration  corrections  and  applies 
tilt  corrections  to  rotation  matrices. 
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ALGORITHM  FOk  EDITING  BIVARIATE  DATA  FILES  WITH 
RANDOM  SPACING  IN  THE  INDEPENDENT  VARIABLE 

1LT  L.D.  Clements 
Data  Reduction  Section 
Yuma  Proving  Ground 
Yuma,  Arizona 


Methods  for  smoothing  equally  spaced,  bivariate  data  have  been 
under  development  for  the  past  one  hundred  years.  A  wide  variety  of 
techniques  have  been  adopted  for  this  purpose  (see  Whittaker  and  Robinson, 
The  Calculus  of  Observations,  Dover,  1967).  A  special  case  within  the 
larger  problem  of  data  smoothing  is  the  need  to  remove  points  which  are 
grossly  in  erior  with  respect  to  the  surrounding  data.  Again,  techniques 
are  available  for  use  with  evenly  spaced  data  (see  Handbook  of  Data 
Reduction  Methods,  White  Sands  Missile  Range,  1964).  In  the  literature 
available  to  the  author,  however,  there  appears  to  be  no  method  available 
for  directly  editing  gross  values  out  of  a  bivariate  data  file  with  uneven 
spacing  in  the  independent  variable.  The  intent  of  this  paper  is  to 
introduce  such  a  technique. 

DEVELOPMENT  OF  TECHNIQUE.  Consider  first  the  equally  spaced  string 
of  numbers  given  in  Table  1  and  plotted  in  Figure  1.  It  is  obvious  from 
the  figure  that  point  number  11  is  in  error,  as  are  points  21,  22,  and  23. 
The  question  then  becomes,  does  a  single  "bad"  point  or  series  of  "bad" 
points  generate  a  discernible  pattern  in  the  derivatives  (actually  dif¬ 
ferences)  which  may  be  taken?  The  table  and  figure  show  that  indeed  there 
is  a  definite  pattern  of  anomalous  difference  values  generated  by  erroneous 
points. 

The  approach  used  was  then  to  say,  if  in  the  sequence  of  derivatives 
a  given  pattern  of  anomalous  values  is  present,  then  the  point (s)  generating 
this  pattern  must  be  incorrect.  Symbolically,  let  us  define  for  a  datum 
pair  (x^,^),  the  first  difference  pair  (vi,i+i,ti,i+i),  where 


xi+l“xi 

vi,i+i "  z — zr 

ti+l  ci 


and 


ci,i+l 

and  the  second  difference 


ti+l+ti 


(1) 


(2) 


> 


Preceding  page  blank 
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*  -  _ 


t  - 


ai,i+l,i+2 


vi+l , i+2“vi , i+1 


ti+l,i+2_ti,i+l 

Now  if  the  point  x^  is  bad,  then  iP  Vj^  1+^,  aj_2  a^_^  ^  i+^, 

and  a^  ^+2  are  abnormal  with  respect  to  surrounding  values.*  ’ 

Similarly, *if  points  x±,  . . xi+n  are  °^^set  in  tlie  same  direction 

such  as  points  21,  22,  23  in  Figure  1,  v^_^  ^  and  v.^  ^-^+1  and 

ai-2,i-l,i»  ai-l,i,i+l»  ai+n-l,i+n,i+n+l»  and  ai+n,i+n+l,i+n+2  are 
abnormal.  Although  to  carry  out  this  sequence  of  calculations  by  hand  is 
ridiculous,  a  digital  computer  may  rapidly  use  this  pattern  identification 
technique  to  locate  bad  points  in  an  unevenly  spaced  bivariate  data  stream. 


FEATURES  OF  THE  COMPUTER  ALGORITHM.  The  key  feature  in  determining 
a  suitable  algorithm  for  editing  using  the  pattern  recognition  scheme 
outlined  above  is  answering  the  question:  When  is  a  point  grossly  in  error? 
Erroneous  points  on  a  graph  stand  out  only  if  they  are  far  outside  of  the 
normal  spread  of  the  data.  In  the  algorithm  described  above,  editing  is 
dependent  upon  identifying  anomalous  values  of  Ax/At  and  A2x/At2.  The 
test  used  is  if  Ax/At  or  A2x/At2  is  greater  than  2. 5-3. 5  times  the  pre¬ 
viously  established  values,  then  the  derivative  magnitude  is  excessive. 

The  choice  of  this  constant  multiplier  is  necessarily  arbitrary,  but 
dependent  upon  the  level  of  noise  present  in  the  bivariate  data  stream. 

If  an  isolated  point  is  identified  as  being  bad  the  program  user  can 
either  drop  the  point  out  of  the  file,  or  it  may  be  replaced.  When  the 
replacement  option  is  exercised,  a  quadratic  equation  is  fit  to  eight 
points  surrounding  the  point  in  error  and  a  corrected  value  calculated. 
However,  among  the  eight  points,  if  more  than  two  have  been  identified  as 
bad  points  themselves,  the  fit  is  not  made  and  the  point  in  question  is 
dropped  from  the  file.  A  sequence  of  erroneous  points,  once  identified 
is  dropped  from  the  file. 


TESTING  OF  THE  ALGORITHM.  Noisy  data  were  generated  from  two 
analytical  equations  in  the  following  manner.  A  starting  value  of  the 
independent  variable  was  decided  upon  and  a  standard  increment  size.  A 
uniform  distribution  random  number  generator  then  was  used  to  calculate  a 
fraction  of  the  increment  size  to  be  used  and  either  , 

c  c 


i+1 


ti  +  ,i 


At 


(4) 


or 


CI+1  “  ti  +At  +  fi  *  At  (5) 

was  used  to  calculate  a  new  value  of  the  independent  variable,  A 
second  random  number  was  generated  and  the  ratio 
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was 


A  -  fx/f2;  if  f1<.5,A  -  -A  (6) 

used  to  generate  the  noisy  dependent  variable  from  the  relation 


dF 


*i-F(ti)+A-g 


(7) 


t-t. 


where  F(t. 
analytical 

)  is  the  bivariate  functional  form  being  used. 

[  equations  used  were 

The  two 

-  100.  -  10.2ti  -  0.5t12 

(8) 

and 

j 

x±  ■  0.1tisin(ti/ir) 

(9) 

Fquation  (8)  was  used  as  a  crude  simulation  of  a  terminal  trajectory 
record,  while  equation  (9)  was  of  interest  to  study  the  effects  of  low 
orddr  oscillations  on  the  algorithm, 

1  The  noisy  input  data  from  equations  (4)  and  (8)  are  shown  in 
Figure  2  and  the  edited  data  in  Figure  3.  Note  there  is  a  difference 
of  an  order  of  magnitude  in  the  x^  scale  which  accounts  for  the  apparent 
increased  spread  of  the  data.  Similar  results  from  e"  ..ions  (5)  and 
(8)  are  shown  in  Figures  4  and  5.  As  is  evident  from  Figures  6  and  7, 
generated  by  equations  (5)  and  (9),  oscillations  in  the  data  file  do 
degrade  performance  somewhat.  Overall  results  are  encouraging,  however, 
becaluse,  as  is  evident,  the  algorithm  does  identify  and  remove  most  of  the 
gross  points.  Data  files,  once  edited  may  then  be  passed  on  to  more 
refined  smoothing  routines  to  eliminate  inherent  noise. 

■  CONCLUSIONS.  Grossly  erroneous  points  in  a  varyingly  spaced,  bi¬ 
variate  data  file  may  be  identified  and  either  corrected  or  removed 
using  the  algorithm  described  in  this  paper.  Data  screened  in  this 
manner  are  then  suitable  for  further  smoothing  and/or  processing. 

Note:  Copies  of  the  algorithm  described  here  ir.  FORTRAN  IV  are 
available  upon  request  from 

Commander 
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Table  1:  Patterns  of  Anomalous  Differences  In 
Bivariate  Data  Stream 
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Figure  1:  Generated  Patterns  In  First  and  Second  uinerences  irom  arroueoua  utu,a 
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STATISTICAL  ANALYSIS  OF  H.  F.  OBLIQUE  AND  VERTICAL 
INCIDENCE  IONOSPHERIC  DATA  APPLICABLE  TO  FIELD  ARMY  DISTANCES 

Richard  J.  D'Accardi 
Chris  P.  Tsokos* 

U.  S.  Army  Electronics  Command 
Fort  Monmouth,  New  Jersey 

ABSTRACT.  The  object  of  this  paper  is  to  present  a  statistical  approach 
for  the  analysis  and  interpretation  of  short-path  oblique  incidence  and 
vertical  incidence  ionospheric  soundings  over  typical  field  army  distances. 

Univariate  spectral  analysis  is  performed  on  the  non-stationary 
stochastic  realization  of  the  oblique  and  vertical  incidence  data  taken  over 
the  60  Km  path  between  Fort  Monmouth,  New  Jersey,  and  Fort  Dix,  New  Jersey. 
Estimates  of  the  power  spectrum  are  obtained  using  three  "lag  windows," 
namely,  those  of  Bartlett,  Tukey,  and  Parzen,  respectively.  A  specific 
truncation  length  has  been  obtained  for  each  of  the  above  windows  so  that, 

,  ,  c  .  c  “  a  u  , 

c  regardless  of  which  one  is  utilized,  the  same  approximate  estimate  of  the 
power  spectrum  will  be  obtained.  In  addition,  bivariate  spectral  analysis 
of  the  vertical  and  oblique  incidence  data  is  given. 


*  Department  of  Mathematics,  University  of  South  Florida,  Tampa,  Florida 
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1.  INTRODUCTION 


The  deployment  of  a  Field  Army  necessitates  many  weans  of  coranunication, 
especially  those  not  limited  by  line-of-sight,  extended  distances,  and  inter¬ 
vening  terrain  obstacles.  High  frequency  (HF)  communications  systems  are  not 
so  stringently  limited,  and  provide  excellent  back-up  for  higher  frequency 
and  high  density  systems.  It,  too,  has  limitations,  foremost  of  which  is  the 
ionosphere,  a  medium  which  is  time  variant,  random,  and  to  say  the  least, 
highly  unstable.  Although  much  attention  has  been  given  to  developing  and 
fielding  superior  equipment,  far  less  attention  has  been  given  to  improving 
the  use  of  the  propagation  media.  It  is  in  this  regard  that  the  Cammunica-  _ 
tions/ADP  laboratory,  of  the  U.  S.  Army  Electronics  Command,  has  sponsored 
three  experiments  aimed  at  developing  media-system  parameters  to  provide 
tactical  (HF)  communicators  with  propagation  predictions,  in  near  real-time, 
for  a  typical  Field  Army  area  of  influence.  Specifically,  both  vertical 
incidence  and  short-path  oblique  incidence  ionosonde  data  were  taken  over 
60  Km,  200  Kin,  and  500  Km  paths  and  analysed. 

Work  by  D’Accardi-Tsokos-Kulinyi  [1971]  was  the  first  in  dealing  with 
short-path  ionospheric  data  as  a  stochastic  realization  as  opposed  to 
analysis  and  forecasting  on  the  basis  of  specific  ''blocks"  of  time  of  day, 
Ames-Egan  [1958],  each  being  considered  as  independent  and  homogeneous. 

Their  results  introduced  a  new  statistical  concept  to  the  estimation  of  short- 
path  oblique  incidence  (01)  ionospheric  data,  and  provided  statistical  models 
to  forecast  either  oblique  or  vertical  incidence  soundings  over  specific  paths. 
With  respect  to  their  first  objective,  regression  techniques  were  used  to 
relate  vertical  incidence  (VI )  soundings  to  01  soundings  as  the  first  part  of 
the  forecasting  problem.  This  was  a  practical  alternative  to  the  widely 
accepted  secant  cp  law,  which,  due  to  the  lack  of  mid-path  data,  to  the 
assumption  of  a  stratified  ionosphere,  and  to  the  difficulty  in  scaling 
virtual  height  at  the  critical  frequencies,  yielded  poor  results  when  applied 
over  the  longer  path  experiments.  With  regard  to  their  second  objective,  the 
actual  forecasting,  they  have  shown  that  both  the  vertical  and  oblique 
soundings  are  non-stationary  stochastic  realizations;  that  is,  they  form  a 
discrete  time  series  that  is  not  in  statistical  equilibrium.  Their  data  was 
characterized  by  autoregressive,  moving  averages,  and  combination  processes. 
This  approach  has  pci  ".ted  out  that  more  information  can  be  obtained  from  the 
data  with  respect  to  the  development  of  system  parameters,  Krause  et  al  [1970]. 
As  a  continuation  of  this  effort,  the  aim  of  this  paper  is  to  utilize  the 
information  gained  by  D'Accardi,  Kulinyi,  and  Tsokos  in  the  analysis  of  the 
power  spectrum.  That  is,  to  describe  in  detail  how  the  variance  of  the 
stochastic  realization  (non-stationary  ionospheric  data)  is  distributed  with 
frequency. 

In  section  3>  we  shall  give  some  basic  concepts  of  time  series  analysis, 
defining  stationary  and  non-stationary  processes.  The  procedure  for  fore¬ 
casting  is  also  given,  including  the  difference  equation  for  forecasting  the 
average  oblique  incidence  soundings  for  the  60  Km  experiment.  The  model  is 
used  in  the  spectral  analysis  of  the  oblique  incidence  series. 

Section  4  is  devoted  to  basic  concepts  and  a  systematic  procedure  for 
the  spectral  analysis  of  01  soundings.  The  theoretical  spectrum,  rTT(f)>  is 
shown  for  the  general  discrete  autoregressive  (ar)  process,  and  it  Is  shown 
specifically  for  a  third  order  ar  process..  Estimates  of  the  spectral  density 


-310- 


function  using  the  lag  windows  of  Bartlett,  Tukey,  and  Parzen  are  given 
including  the  criteria  for  choosing  the  best  one  for  estimating  rry(f).  For 
various  truncation  lengths,  L,  the  bandwidth,  95$  confidence  intervals,  and 
degrees  of  freedom  are  given  for  each  window.  The  bivariate  behavior  of  the 
oblique  and  vertical  incidence  ionospheric  soundings  for  the  60  Km  experiment 
is  discussed  in  section  5-  More  specifically,  we  obtain  estimates  of  the 
smoothed  coquadrature,  phase,  and  cross-amplitude  spectra  using  all  three  lag 
windows.  Estimates  of  the  coherency  spectrum  are  also  given. 

2.  DESIGN  OF  THE  EXPERIMENT 

The  experimental  distances  of  60  Kin,  200  Km,  and  500  Km  were  chosen  to 
fall  within  the  idealized  300  x  300  kilometer  tactical  Field  Army  area  of 
responsibility.  The  diagonal  distance  of  such  an  area  is  approximately 
UkO  Km,  and  represents  the  longest  distance  of  an  internal  communications 
path.  With  Fort  Monmouth,  N.  J.,  as  a  base  station,  mobile  ionosonde 
terminals  were  set  up  at  Fort  Dix,  N.  J.  ( 60  Kia),  Aberdeen  Proving  Ground, 

Md.  (200  Km),  and  at  Camp  Drum,  N.  Y.  (500  Km),  as  shown  in  Figure  1.  The 
analysis  presented  herein,  is  based  upon  the  60  Km  path  data,  but  similar 
results  were  achieved  over  the  other  paths  and  will  be  presented  at  a  later 
time. 

Each  ionosonde  terminal  of  the  60  Km  path,  operating  in  the  2-l6  MHz 
range,  made  scheduled  soundings  every  10  minutes  for  eighteen  days  over  a 
twenty-one  day  period.  While  the  Fort  Monmouth  terminal  was  transmitting  and 
receiving  its  own  signal,  the  Fort  Dix  terminal  would  simultaneously  receive 
the  same  transmission;  likewise  for  the  Fort  Dix  to  Fort  Monmouth  station 
(see  Figure  2).  Both  ionosndes*  were  synchronized  so  that  the  "remote” 
sounder  scans  would  be  precise  with  the  fixed  station.  The  number  of  days 
that  the  experiment  was  performed  has  so  significance  with  respect  to  the 
results  obtained,  but  was  a  matter  of  funding. 

The  frequency  range  of  the  sounders  spanned  three  "octaves,"  2-h  MHz, 

**-9  MHz,  and  8-l6  MHz,  each  of  which  contained  4 00  discrete  channels.  Trans¬ 
missions  consisted  of  successively  "stepping"  through  the  channels  of  each 
octave  with  100  ye  pulses.  The  resulting  data  is  a  recording  of  these  pulses, 
as  they  return  from  the  ionosphere,  parametric  in  frequency  and  time  delay. 

The  time  delay  is  a  measure  of  virtual  height  of  reflection  from  the  iono¬ 
spheric  layers.  Figure  3  shows  an  ionogram  record  of  frequency  vs.  time 
delay.  These  records  were  taken  on  35  mm.  film  at  Fort  Monmouth  and  on  light- 
sensitive  oscillograph  paper  at  the  remote  terminal.  After  collection  and 
development,  the  ionograms  were  scaled  for  the  extraordinary  critical  frequen¬ 
cies,  fjFj,  (see  Figure  3),  and  the  resulting  data  was  compiled  for  computer 
analysis.  Some  data  (ionograms)  were  unreadable  due  to  man-made  noise,  and 
solar  and  geomagnetic  activity.  For  those  records  that  were  unreadable 
(though  signal  was  detected),  simulated  data  was  prepared.  The  occurence  of 
obscured  data  was  negligible  over  the  experiments. 


♦These  instruments  were  Granger  Associates  Model  3905-5  systems,  matched  with 
wide- response  delta  antennas. 


Figure  2.  EXPERIMENTAL  LAYOUTS 


3.  BASIC  PROCEDURE  FOR  POHMUIATHC  THE  FORECASTING  MODEL 


3.1  Basic  Concepts  In  Time  Series  Analysis; 

Any  phenomenon  such  as  the  oblique  incidence  soundings,  which 
changes  vlth  time,  and  any  collection  of  data  that  measures  the  aspects  of 
such  a  phenomenon  can  be  considered  as  a  time  series.  A  time  series  can 
either  be  a  deterministic  function  or  a  non-deterministic  function  of  an 
independent  variable,  usually  time;  but,  in  most  physical  situations,  it  will 
be  a  non-deterministic  function.  A  non-deterministic  function  exhibits  random 
or  fluct  ;»ting  properties  and,  hence,  it  is  not  possible  to  forecast  Its 
future  values  exactly.  Thus,  a  non-deterministic  time  series  can  only  be 
described  by  statistical  laws  or  models.  We  begin  by  assuming  that  one  can 
describe  a  time  series  at  a  given  time,  t,  by  a  random  variable  and  Its  associ¬ 
ated  probability  distribution  function.  In  this  manner,  we  may  describe  at 
all  instances,  the  behavior  of  a  time  series  by  an  ordered  set  of  random  vari¬ 
ables  and  the  associated  probability  distributions.  Such  sn  ordered  set  of 
random  variables  is  called  a  stochastic  process.  Thus,  an  observed  time 
series,  yt,  can  be  considered  as  one  realization  c*  an  infinite  ensemble  of 
functions  that  might  have  been  generated  by  a  stochastic  process.  Such  a 
process  is  said  to  be  strictly  stationary,  is  a  Joint  probability  distribu¬ 
tion  of  any  set  of  observations,  and  is  not  affected  by  shifting  all  times  of 
the  observations  ahead  or  backward  by  any  integer  amount,  k.  A  stationary 
stochastic  process  can  be  described  in  terms  of  its  mean,  jji,  which  is 
estimated  by: 

y  *  r  E  y»  >  (3.1.1) 

n  t«i 

its  variance,  a3,  which  is  estimated  by: 

s  8  -  I  r  (y%  -  y)3  , 
n  tax 

its  sample  autocovariance  function,  which  measures  the  extent 
random  variables  are  linearly  independent,  estimated  by: 

c„(k) «  i  V  (yt  -  y)  (y**i«-  y)  > 

n  tal 

where  k  »  0,  1,  2,  ......  n-1,  r  '  '  •  •  •  • 

and  its  sample  autocorrelation  function,  which  acts  like  a  correlation 
coefficient  and  is  estimated  by; 

c  Tic} 

r,r(lt)  “  ^TToT  *  k  “  °’  lf  n‘1  (3,1,U) 


(3.1.2) 

c  C  <  - 

to  which  two 

(3.1*3) 


3.2  Stationary  vs.  Won-Stationary  Time  Series: 

A  stationary  time  series  is  a  series  that  is  in  statistical 
equilibrium  in  the  sense  that  its  properties  do  not  change  significantly  with 
respect  to  time.  A  non- stationary  time  series  is  such  that  Its  properties 
change  with  time.  The  information  with  which  the  present  study  is  concerned, 
namely,  ionospheric  soundings,  is  non- stationary  in  nature.  In  general. 
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non- stationary  phenomena  can  be  divided  into  three  basic  classes: 

(a)  Those  time  series  that  exhibit  stationary  properties  over  a 
long  period  of  time. 


time. 


(b)  Those  that  are  approximately  stationary  over  short  periods  of 


(c)  Those  time  series  that  exhibit  non- stationary  properties;  that 
is,  their  visual  properties  change  continuously  with  respect  to  time. 

In  the  present  state-of-the-art,  there  exist  techniques  to  analyze 
stationary  time  series  information;  however,  the  techniques  available  for  the 
analysis  and  interpretation  of  non- stationary  time  series  information,  such 
as  ionospheric  data, are  inadequate  and  do  not  lend  themselves  to  meaningful 
interpretations  of  physical  situations.  It  is  possible,  however,  to  adjust 
non-stationary  time  series  so  as  to  be  able  to  apply  the  existing  techniques 
of  stationary  time  series  analysis.  This  adjustment  takes  the  form  of  apply¬ 
ing  a  proper  filter  to  the  observed  non-stationary  time  series  to  filter  out 
the  non-stationary  components. 

The  search  for  a  mathematical  function  to  transform  the  non- 
stationary  time  series  into  a  stationary  series  is  in  some  respects  a 
trial-and-error  procedure.  One  of  the  most  popular  and  most  efficient 
methods  of  accomplishing  this  purpose  is  the  application  of  a  difference 
equation  (see  Jenkins  and  Watts  [1968];  Box  and  Jenkins  [1970],  among  others). 
A  first  order  difference  equation  is  defined  by: 


where: 


a-a-i  ' 


yt  »  observed  non-stationary  series, 


(3.2.1) 


the  first  difference  series. 


Similarly,  a  second  order  difference  equation  is  defined  by: 


A  -  2A 


A -a  » 


(3=2.2) 


and  so  on.  In  practice,  a  first  or  second  order  difference  equation  is 
usually  sufficient  to  transform  most  non-stationary  time  series. 

To  identify  whether  or  not  the  observed  time  series  exhibits 
stationary  or  non-stationary  properties,  we  make  use  of  the  following  three 
basic  concepts: 

(a)  Visual  interpretation  of  the  series. 

(b)  A  plot  of  the  sample  autocorrelation  function  of  the  observed 

series . 


(c)  Application  of  various  trend  tests  to  the  observed  series 


The  graphical  representation  of  the  observed  series  can  be  of  practical 
help.  However,  for  a  more  rigorous  classification  of  the  series,  we  must 
rely  cm  the  latter  two  concepts.  For  the  observed  series  and  its  first  and 
second  differences,  one  computes  the  sample  autocorrelation  function  using 
equation  3.1«^>  and  conducts  trend  tests,  such  as  Kendall's  tau,  Kendall  and 
Stewart  [1966].  The  sample  autocorrelation  function  of  a  stationary 
phenomenon  has  the  basic  property  that  it  dampens  out  fairly  rapidly;  that 
is,  it  approaches  zero.  Also,  a  stationary  series  will  be  such  that  it 
contains  no  trend.  Following  this  procedure,  one  can  obtain  sufficient 
information  to  determine  if  the  observed  series  exhibits  stationary  or  non- 
stationary  components;  and  if  it  exhibits  non- stationary  components,  whether 
or  not  a'Tfrst  or  second  order  difference  equation  would  filter  them  out. 

Having  reduced  the  given  information  to  a  stationary  time  series, 
our  aim  is  to  fit  a  parametric  model  to  this  series,  either  an  autoregressive, 
a  moving  averages,  or  a  combination  of  the  two.  These  stationary  stochastic 
models  assume  that  the  process  (series)  remains  in  equilibrium  about  a  con¬ 
stant  mean  level  and  they  are  of  great  value  in  modeling  stationary  time 
series.  The  general  autoregressive  process  is  given  by: 

Yt  -  |i  -  +  ....  +  a,(Yt_,-y)  +  \  >  (3.2.3) 

where  (j  is  the  mean  of  Yt ,  Z,  is  a  purely  random  process  (Jenkins  and  WattB 
[1968],  and  m  is  the  order  of  the  process.  The  general  moving  average 
process  is  given  by: 

Yt-jj  ■  “  Pq^*~q  *  (3*2. k) 

where  ^  and  Zt  are  as  defined  above,  and  q  is  the  order  of  the  process.  The 
general  mixed  autoregressive-moving  average  process  is  given  by: 

Y»"}1  ■  (Y^j-Ji)  +  ...  +  CtgfY^^-n)  +  “  .•»  > 

'•  •  <  «  c  .  -  . 

where  q  is  independent  of  m.  4  4  (3*2.5) 

In  a  recent  paper,  D'Accardi,  Kulinyi,  and  Tsokos  [1971],  working 
with  ionospheric  information,  have  outlined  a  procedural  approach  for  fitting 
the  above  models.  They  discuss  in  detail  the  criteria  for  selecting  the 
process.  Its  order  (which  gives  the  best  fit  to  the  observed  series),  the 
procedure  to  estimate  its  parameters,  diagnostic  check  of  goodness  of  fit, 
and  how  the  model  can  be  employed  in  forecasting  ionospheric  soundings. 

3.3  Forecasting  Model  for  the  Average  of  the  Oblique  Incidence  Soundings 
for  the  60  Km  Path; 

Following  the  above  procedure,  we  have  fitted  a  third  order  auto¬ 
regressive  model  to  the  average  oblique  incidence  soundings  of  the  60  Km  path 
series.  The  model  which  best  characterized  this  series  is  given  by  the 
following  difference  equation: 

yt  -  -0.5987  yt_x  -0.2034  yt_a  +  0.1608  yt_4  •  (3.3.1) 

In  formulating  this  model,  we  utilized  a  second  order  difference  filter, 
given  by  equation  (3.2.2),  for  t  *  3, . . . , 85,  and  the  estimates  for  the  above 


model  were  found  to  be:  . 

'  ^  -  -0.59867 

•fc  >  -0.20345 

Hi  m  -0.1608E  •  '  " 

Figure  4  gives  a  graphical  display  of  the  average  of  the  original  oblique 
Incidence  soundings  of  the  60  Kin  experiment,  and  the  predicted  values  of  the 
series.  Rote  that  we  began  predicting  the  average  oblique  soundings  after 
having  observed  the  first  four  observations,  and  utilizing  this  information, 
ve  continued  to  forecast  until  a  difference  of  0.5  units  occurred  between 
actual  and  predicted  Information.  Even  though  our  last  observed  average 
oblique  soundings  was  recorded  at  time  slot  85,  we  continued  predicting  up  to 
time  slot  99* 

The  above  model,  equation  (3-3. l),  will  be  used  in  a  later  section 
where  we  will  be  concerned  with  the  spectral  analysis  of  the  oblique 
Ionospheric  series. 


4.  SOME  BASIC  CONCEPTS  OF  THE  SPECTRAL  ANALYSIS  OF  OBLIgJE  INCIDENCE 
SOUNDINGS 

In  this  section,  we  shall  present  a  spectral  analysis  of  the  oblique 
incidence  soundings  of  the  60  Km  experiment.  Ve  shall  be  using  some  of  the 
basic  concepts  of  spectral  analysis  given  by  Jenkins  and  Watts  [19683,  and 
Bax  and  Jenkins  [1971]- 

4.1  The  Theoretical  Spectrum: 

The  fou.  .rfform  of  the  autocovariance  function  is  called  the 

pcwer  spectrum  or  _  .  yectrum  of  the  time  series,  and  its  plot  shows  how  the 

variance  of  the  stocnastlc  realization  is  distributed  with  respect  to 
frequency  (time).  The  theoretical  spectrum,  denoted  by  r„(f),  can  be  written 
by  the  following  equation: 

ryr(f)  -  H(f )|8,  -  *  f  *  ^  ,  (4.1.1) 

where:  e  .  1  ^ 

H(f )  «  frequency  response  function,  ‘ 

‘  <xxa  •  the  variance  of  a  purely  random  process, 

A  »  Incrementation  Interval  between  observations. 

For  the  general  discrete  autoregressive  process,  equation  (3.1.1), 
the  theoretical  spectrum  is  given  by: 


IVx(f) 


l-Oje-J^A  .  ...  -  c^e' 


TTffT . | 


(4.1.2) 


For  the  oblique  Incidence  soundings,  we  have  formulated  a  third  order  (m=3) 
autoregressive  process.  Thus,  equation  (4.1.1)  can  be  written  as  follows: 
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•  .  _ 


1 


I  H(f ) | 


'l+o^ +aj+a|+(2aaaa  -2^+ 2^ a*  )Cos  2nf  +(2a1a3-2at)Co8  hrrf  - 


2^ Cos  6nf  j  (4.1.3) 

Substituting  ,  a^,  and  for  c^,  Og ,  and  03  respectively,  and  obtaining 
a  <7,a  estimate  (assuming  A*l),  ve  obtained  an  estimate  of  the  theoretical 
spectral  density  function,*  ryr(f)/o,a  for  the“filtered  process  Y~  ~1,'..'.,85, 
as  seen  from  equation  (3-3-1).  From  the  process,  we  obtain: 


orga  -  Var(Zt)  «  Var(Yt-«1Yt.l-aaYt.jl-a9Yt.a) 


-  Oya+a1aOTa  +  Ctaaaya  +  0aa0ya  -  2o1Cov(Yt,Yt_1 )  -2q,Cov(Yt,Yt_g) 


-2aaCov(Yt;Yt_3)  +  2a1aaCov(Yt,Yt_1 )  +  2a2a3Cov(Yt,Yt_1 ) 
+2^0(9  Cov(Yt,Yt_2) 

-  oy3  Cl+a1a+a4a+a9a-2a1pyy(D-2a9p,r(2)-  2ct3pTr (3)+2a1aapyy(i) 
+2aaa3pyy(1)  +2a1ot3pyy  (2)3  (4.1.4) 


Hence,  the  estimate  of  ata,  substituting  the  c^,  i  *  1,2,3,  and  r  (k),  the 
estimate  of  pyy(k)  is  given  by:  7 

oJa  -  <j,a(2.693S7) 

Therefore,  the  estimate  of  the  theoretical  spectral  density  of  the  autoregres¬ 
sive  process  fitted  to  the  filtered  data  is: 

^  _ _ 2  >6933  _ 1  -  ■  * 

oya  1.425  +  1.375  Cos  2nf  +  0.214  Cos  hnf  -  0.322  Cos  6nf  , 

where:  -  \  sf  i.  \ 

*  ,  ■  ■  r  (M.5) 

A  smoothed  estimate  of  the  theoretical  spectral  density  function  can 
be  obtained  using  the  following  equation: 

RyyU  )  -  2  |  1  +  2S^rry(k)wQ0Cos  }  (4.1.6) 

l  m  0,1,  ...,F,  where  F  =  2L,  L  being  the  truncation  length,  and  w(k)  is  the 
lag  window.  In  the  above  equation,  the  lag  window  plays  a  major  role  in 
obtaining  a  good  estimate  of  the  spectral  density  function.  In  practice, 

♦Frequently  in  practice,  we  have  to  compare  time  series  with  different  scales 
of  measurement.  In  order  to  do  this,  it  is  necessary  to  normalize  the  spec¬ 
trum,  that  is,  simply  divide  the  theoretical  spectrum  by  the  variance  of  the 
process. 


there  are  three  basic  windows  that  are  commonly  used,  namely  these  of 
Bartlett,  Tukey,  and  Parzen.  We  shall  briefly  define  these  windows,  and  for 
more  specific  details  and  the  properties  concerning  them,  see  Jenkins  and 
Watts  [1968].  Bartlett's  lag  window  is  given  by  the  following  expression* 


WB(u)  "  {  1  "  -Jr  '  u  ** 
0  otherwise 

Tukey' s  lag  window  is  given  by: 


(it.l.T) 


v«>  -  {  *(1+c°5  v  '  M 


otherwise 


(4.1.8) 


Parzen 's  lag  window  is  given  by: 


,(»)-  {  K  ll! ) a  +  6(  111  )3.  M  *1 

'  M  M 

-  {  2(1-  H)  ,  f  s|  u|  *  M 

I  '  M 

=  \  0  otherwise 


(4.1.9) 


A  rectangular  window  is  another  alternative  not  mentioned  above,  which  is 
defined  by: 


,(u)  .  J  1  \  I  ul  *  ® 

L  0  !  otherwise 


f:  C  '  c  o 


(4.1.10) 


where  M  is  a  truncation  poi 


Scientists  who  have  been  involved  in  choosing  the  proper  shape  of  a 
lag  window,  w(u),  have  taken  into  consideration  the  fact  that  the  spectral 
window,  W(f),  that  is,  the  fourier  transform  of  the  lag  window,  should  be 
concentrated  near  the  zero  frequency.  Blackman  and  Tukey  T1959]^  looking  at 
the  problem  from  the  communications  engineering  point  of  view,  almost  identi¬ 
fied  it  with  that  of  choosing  the  intensity  distribution  along  an  antenna,  sc 
that  the  variation  will  rail  in  a  narrow  beam.  The  principal  maximum  and  the 
subsidiary  extreme  of  W(f)  are  called,  respectively,  main  and  side  lobes.  A 
window  should  be  an  even  function  so  that  it  can  equally  treat  positive  and 
negative  values  of  the  spectral  density  function  on  both  sides  of  a  given 
point  of  the  time  series.  It  should  integrate  to  unity,  that  is,  /' 
roa  / 

J  W(f)df  -  1  / 

and  should  achieve  a  maximum  value  at  the  frequency  f  *  0.  That  is, 

|  W(f)  |  s  W(0),  for  all  f. 

/ 

It  should  be  concentrated  as  much  as  possible  about  f  *  0  in  order  that  the^ 
behavior  of  the  spectral  density  function  be  reflected  as  much  as  possible  in" 
that  neighborhood. 
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There  has  been  no  agreed  valid  criterion  for  centering  the  degree 
of  concentration  of  any  window.  One  criterion  could  he  the  ratio  of  the  size 
of  the  second  largest  peak  to  the  size  of  the  largest  peak.  However,  again 
this  would  he  powerful  only  In  the  case  where  the  second  largest  peak  would 
occur  at  the  same  point.  This  fact  explains  why  one  has  to  consider  all  the 
different  windows,  not  the  .most  popular, .in  one’s  search  for  the  most  approp¬ 
riate  case. 

Por  the  main  lobe  of  W(f)  to  he  concentrated,  the  graph  of  w(u) 
should  he  flat  due  to  the  way  the  two  concepts  are  rela-ed.  Also,  for  the 
side  lobes  to  he  small  w(u)  should  he  smooth  and  should  not  change  rapidly 
as  in  the  case  of  the  rectangular  window.  Therefore,  one  should  compromise. 
The  authors'  analyses  have  been  done  along  these  lines,  and  this  is  the 
reason  why  we  have,  as  a  result,  numerous  windows  among  which  we  should 
choose. 


Taking  Bartlett's  spectral  window,  w  (f),  as  an  example,  we  find  that 
when  it  is  graphed  against  frequency,  it  is  symmetric  about  the  origin  and 
has  zeros  atf«±>,±i,±£,  ...  . 

We  shall  call  base  width  the  distance  between  the  first  zeros  on 
either  side  of  the  origin.  The  base  width  for  Bartlett's  window  is  equal  to 
*.  It  is  inversely  proportional  to  M  and  the  variance.  By  increasing  the 
oase  width,  the  bias,  B(f),  increases  as  well.  Thus,  we  are  forced  to 
c onpremise  between  bias  and  variance  in  choosing  a  particular  window. 

The  rectangular  window  is  more  concentrated  about  the  center 
frequency  than  any  other  of  the  windows  under  consideration.  Nevertheless, 
although  it  has  the  smallest  bandwidth,  which  implies  small  bias,  it  also 
has  the  largest  side  lobes.  This  makes  it  very  impractical.  The  first  side 
lobe  is  about  l/5  of  the  height  of  the  main  lobe  which  shows  an  unrealistic 
characterization  of  the  estimate  of  the  power  spectrum. 

T  ■:  :  0  '  u  ‘  '  .■ 

Thus,  in  view  of  the  above  remarks,  for  the  density  function  of  the 
oblique  incidence  soundings,  we  shall  utilize  the  Bartlett,  Tukey,  and  Parzen 
lag  windows  in  search  of  the  best  estimate  of  the  spectral  density  function. 

4.2  Estimate  of  the  Spectral  Density  Function  Using  Bartlett's  Leg 
Window-;  -  0/ 

The  values  of  the  estimate  of  the  spectral  density  function  using 
Bartlett's  lag  window,  equation  (4.1.7),  were  calculated  and  plotted  versus 
frequency  for  L  ■  8,  12,  16,  20,  24,  28,  and  32  units.  As  a  basis  of  compar¬ 
ison,  we  plotted  our  estimate  on  the  same  set  of  axes  for  L  ■  8,  12,  16,  24, 
and  32.  Por  these  values  of  L,  we  calculated  the  bandwidth,  the  confidence 
intervals,  and  the  degrees  of  freedom  which  are  shown  in  the  following  table: 


TABLE  I:  TRUNCATION  POINT,  BANDWIDTH, 
DECREES  OP  FREEDOM,  AND  CONFIDENCE 
INTERVALS  FOR  BARTLETT'S  IAG  WINDCW 


L 

Bandwidth 

d.f . 

95%  C.I. 

»,/f> 

8 

.183 

31 

.63 

1.56 

12 

.125 

20 

•58 

2.10 

16 

.094 

15 

•51* 

2.35 

24 

.063 

10 

.49 

3.00 

32 

.047 

7 

.42 

4.10 

The  formula  used  for  the  bandwidth  of  the  estimate  of  the  spectral  density 
function  is  given  by: 


and  the  equation  for  the  degrees  of  freedom  is  given  by: 

V  -  2(r\b  m  2Tb  m  2(83 )b 
VL/  -  166b. 

Note  that  since  we  have  chosen  A  «  1,  ve  have  L  »  M. 

Figure  5  gives  a  comparison  of  the  theoretical  spectral  density 
function  of  the  autoregressive  process  and  its  smoothed  estimate  for  the 
various  truncation  points,  along  with  the  95%  confidence  intervals. 

It  is  a  known  fact  that  increasing  the  bandwidth  of  the  estimate  of 
the  spectral  density  means  increasing  the  amount  of  bias  and  decreasing  the 
variance;  thus,  a  compromise  has  to  be  reached  as  to  the  best  value  of  L. 

In  making  such  a  decision,  we  should  take  into  consideration  the  confidence 
interval,  the  degrees  of  freedom,  and  the  visual  appearance  of  the  plot  of 
our  estimate.  For  L  =  8,  the  plot  is  very  smooth  and  has  a  shape  very  similar 
to  the  theoretical  spectrum  with  the  bandwidth  being  wide  enough  to  conceal 
any  peaks  that  may  be  present.  By  increasing  L  to  12,  we  obtain  an  indica¬ 
tion  of  another  peak  that  appears  at  f  =  3/16  cycles  per  second,  in  addition 
to  the  major  peak  in  our  theoretical  spectral  density.  The  plot  is  still 
quite  smooth  and  the  bandwidth  is  wide  enough  to  give  a  great  deal  of  faith 
in  our  estimate.  Increasing  L  to  16,  the  bandwidth  seems  to  be  in  a  very 
shaky  range.  However,  the  curve  has  changed  very  little  from  the  one  for 
which  L  *  12.  It  displays  the  peak  in  the  theoretical  spectral  density  and 
also  the  extra  peak  at  f  =  3/16  cycles  per  second.  Since  larger  values  of  L 
produce  many  small  erratic  peaks,  we  chose  L  =  16  to  estimate  the  spectral 
density  using  Bartlett's  lag  window. 
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Figure  5  •  ESTIMATE  OF  THE  SPECTRAL  DENSITY 

USIfC  THE  BARTLETT  IAG  WINDOW 


. - - - - -*V 

4.3  Estimate  of  the  Spectral  Density  Function  Using  Tukey's  lag  Window: 

Using  Tukey's  lagjrindov  given  by  equation  (4.1.8),  the  smooth 
spectral  density  estimate  Ryy(f )  was  calculated  for  L  =  8,  12,  16,  20,  24, 

28,  and  32  units.  In  Figure  6,  ve  display  the  spectral  density  estimates 
along  vith  an  estimate  of  the  theoretical  spectrum  of  the  third  order  auto¬ 
regressive  process.  In  addition,  ve  display  the  95#  confidence  intervals, 
and  the  handwidths  which  correspond  to  the  various  truncation  lengths.  It 
is  clear  that  for  L  =  8,  the  sample  spectrum  has  the  ;>ame  general  shape  as 
the  theoretica_  spectrum,  and  the  curve  is  very  smooth.  Increasing  the 
truncation  value  to  12,  the  plot  is  still  fairly  smooth,  but  a  peak  appears 
at  about  f  *  .186  cycles  per  second.  At  L  *  16,  the  peak  is  slightly  more 
pronounced,  and  as  L  is  increased  to  20  and  above,  more  peaks  appear  at 
higher  frequencies.  This  indicates  that  the  variance  is  increasing  and,  thu% 
the  sample  spectrum  is  becoming  more  erratic  for  L  £  20.  On  this  basis,  ve 
decided  that  for  L  =  12,  l4,  or  16,  ve  would  try  to  obtain  better  estimates 
than  those  calculated  for  other  values  of  L.  We  computed  the  spectral 
density  estimates  for  L  =  14,  18  units,  respectively. 

Table  H  displays, for  the  various  truncation  points,  the  bandwidth, 
degrees  of  freedom,  and  confidence  intervals  using  Tukey's  lag  window. 

TABLE  II:  TRUNCATION  POINT,  BANDWIDTH, 

DEGREES  OF  FREEDOM,  AND  CONFIDENCE 
INTERVAIS  FOR  TUKET'S  LAG  WINDOW 


L 

b 

d.f . 

95#  C.I. 

r„(f) 

8 

.166 

27 

.61 

1.58 

c 

IP 

.1 12 

18 

.57 

2.25 

14 

.095 

16 

•  .54 

‘  r  e  c- 

2.35 

16 

.033 

13  ‘  ‘ 

.51 

'  2.50 

20 

.067 

11 

.49 

2.85 

32 

.049 

6 

.  .41 

4.3o 

Table  II  is  quite  helpful  in  deciding  that  for  L  =  l4  units,  we  will  have 
the  best  estimate  of  the  spectrum  using  Tukey's  lag  window.  The  degrees  of 
freedom,  v  =  15 >  are  stiff ic lent  for  fairly  small  95#  confidence  intervals, 
and  this  gave  a  bandwidth  of  .095  so  that  peaks  in  the  time  spectrum  of 
bandwidths  larger  than  .095  will  be  detected.  Decreasing  the  bandwidth  to 
.083,  that  is,  L  =  16,  causes  a  loss  of  two  degrees  of  freedom  and  a  slight 
increase  in  the  confidence  interval  width.  For  L  =  12,  the  bandwidth  is 
considerably  larger  (.112),  and  there  is  not  much  change  in  the  confidence 
interval  even  though  there  are  eighteen  degees  of  freedom.  Therefore,  for  a 
truncation  length  of  14  units,  we  obtain  the  best  estimate  for  the  spectrum 
using  Tukey's  lag  window.  Figure  7  shows  the  spectral  density  estimates  of 
the  filtered  data  using  the  Tukey  lag  window  for  truncation  lengths  L  =  8, 
12,  l4,  16  and  32,  along  with  the  95#  confidence  intervals  and  the  various 
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’IMATES  OF  THE  SPECTRAL  DENSITY 
USING  THE  TUKEY  LAG  WINDOW 


■% 


bandwldths  associated  with  these  truncation  points.  As  we  mentioned  previ¬ 
ously,  the  plot  of  the  estimate  of  the  spectral  density  is  given  in  the 
logarithmic  scale  to  show  more  detail  in  the  spectrum  over  a  wider  amplitude 
range. 

4.4  Estimates  of  the  Spectral  Density  Function  Using  Parzen's  lag 
Window: 

Using  Parzen's  lag  window,  given  by  equation  (4.1.9),  we  obtained 
estimates  of  the  spe  tral  density  function  for  various  truncation  points.  As 
before,  we  shall  let  A  *  1#  bo  that  L  =  M,  the  truncation  points  of  the 
smoothed  spectral  estimator.  We  varied  L  from  8  to  32  in  intervals  of  four 
units. 

Figure  8  shows  the  spectral  density  estimates  of  the  filtered  data 
for  the  various  truncation  points  along  with  the  theoretical  spectral  density 
of  the  third  order  autoregressive  process.  In  addition,  a  95#  confidence 
interval  and  the  corresponding  bandwldths  are  displayed.  The  bandwidth  using 
Parzen's  lag  window  is  given  by: 

.  1.86  1.86 
b  "  "LA"  ”  ~L~ 

The  degrees  of  freedom  for  the  confidence  intervals  were  found  using  the 
following  relationship: 


where  bx  «  1.86  for  the  Parzen  window  and  T  »  total  number  of  observations, 
which  in  our  case,  is  85  oblique  incidence  soundings.  Table  HI  gives,  for 
the  various  truncation  points,  the  corresponding  bandwldths,  degrees  of 
freedom,  and  a  95^6  confidence  interval  for  the  theoretical  spectrum,  r„(f), 
for  the  Parzen  lag  window.  c  -,c  „V 


TABLE  III:  BANDWIDTH,  DHJREES  OF  FREEDOM, 
AND  95#  CONFIDENCE  INTERVAL  FOR  SELECTED 
VALUES  OF  L  FOR  PARZEN'S  WINDOW 


L 

Bandwidth 

d.f . 

95#  C.  I. 

rrr(f) 

8 

.233 

c 

38 

.65  ‘ 

1.54 

16 

.116 

19 

.58 

2.20 

20 

.093 

15 

.54 

2.35 

24 

.078 

12 

.50 

2.75 

32 

.058 

9 

.48 

3.30 

In  selecting  a  proper  value  for  L  for  our  spectral  density,  we  want 
to  be  able  to  detect  peaks  in  the  spectrum,  have  a  reasonable  confidence 
interval,  and  a  bandwidth  which  affords  us  a  reasonable  bias.  For  an  L  value 
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Figure  8.  ESTIMATE  OF  WE  SPECTRAL  DENSITY 
USIN3  THE  PARZEN  LAG  WINDOW 
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Of  8  units,  the  spectral  density  vas  too  smooth,  and  ve  were  unable  to 
detect  peaks  less  than  0.233  vide.  Increasing  the  L  values  from  l6  to  20 
units,  gives  a  fairly  reasonable  display  of  the  spectral  density,  that  is, 
two  major  peaks  occur  which  are  quite  similar  to  those  of  the  theoretical 
density.  For  a  truncation  point  of  2k  units,  very  small  peaks  begin  to 
appear  which  indicate  that  the  variance  may  be  influencing  the  density. 

This  was  also  seen  at  L  =  28  and  32,  where  the  peaks  became  very  erratic, 
and  very  noticeable.  Thus,  our  choice  was  narrowed  very  quickly  to  choosing 
L  ■  16  or  20  units.  The  confidence  intervals  for  L  =  16  end  L  =  20  units 
are  almost  identical.  The  bandwidth  for  L  *>  20,  however,  has  been  reduced  by 
about  2Cff>  from  that  of  L  =  l6.  Therefore,  the  spectral  density  corresponding 
to  L  -  20  units  was  selected  as  the  most  reasonable  truncation  point.  The 
spectral  density  estimate  clearly  shows  that  most  of  the  power  is  concen¬ 
trated  at  high  frequencies.  A  major  peak  is  located  at  f  e  .375  cycles  per 
second  with  a  smaller  peak  located  at  .18  cycles  per  second.  The  bandwidth 
for  L  ■  20  units  is  .093,  which  means  that  we  can  detect  peaks  with  a  width 
of  this  value  or  greater.  The  above  remarks  are  graphically  verified  in 
Figure  8  where  the  theoretical  spectral  density  for  the  third  order  auto¬ 
regressive  model  is  compared  with  the  spectral  estimate  for  L  =  8,  l6,  20, 

2k,  and  32  units.  In  addition,  the  95 1>  confidence  interval  and  the  corres¬ 
ponding  bandwidths  for  the  truncation  points  are  given. 


t 
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5.  BIVARIATE  SPECTRAL  AM  LYSIS  OF  THE  01  SCUKDIMGS: 

In  this  section,  we  shall  be  concerned  with  analyzing  the  bivariate 
behavior  of  the  oblique  and  vertical  incidence  ionospheric  soundings  for  the 
60  Kin  experiment.  More  specifically,  we  shall  obtain  estimates  of  the  smooth 
coquadrature,  phase,  and  cross-amplitude  spectra  using  the  three  lag  windows 
we  discussed  in  Section  k.  In  addition,  we  shall  obtain  estimates  of  the 
coherency  spectrum. 

With  respect  to  the  aims  of  the  present  Btudy,  we  will  only  give  the 
equations  (estimates)  which  characterize  the  above  concepts  and  we  will  not 
^  ■  discuss  the  theoretical  implications.  For  complete  details  of  these  concepts, 
see  Jenkins  and  Watts  [1968]  and  Box  and  Jenkins  [1971]. 


The  sample  cross-correlation  function  is  defined  by: 

r,*oo  -  c>»(k) 

; c  •  ;  -  '  /c„i°)  °„(o) 

where:  x  k-k 

c„(k)  -  jj  Zmi  (Ylt-Y  )(Xat+fc-X  )  ,  0  i  k  s  L  -  1 


(5.1.1) 


(5.1.2) 


As  in  the  univariate  case,  the  sample  cross  spectrum  is  obtained  by  taking 
the  fourier  transform  of  the  sample  cross  covariance  function.  The  sample 
cospectral  estimate  is  given  by: 

L„(i)  -  {  0  .  .  „  1-1  a  .  TTik  \ 

r *  \MJX(o)*2  E  f  (k)w(k)CoB  —  I  ,  OsisF 

k«l  7 

(5.1.3) 
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.A 


where: 


jfy«00  *  i  £c,*(k)  +  c,«(-k)}>  0  £  k  *  L-l 


(5-lA) 


The  equation  used  to  calculate  the  quadrature  spectral  estimate  is  given  "by: 


Q,,(i)  -  4  q,*(kMk)siii  ^  >  1  *  i  *  P-1 


(5.1.5) 


where: 


q„00  -  i  {c,*(k)-c»,(-k»  >  0  s  k  S  L-l 


(5.1.6) 


Note  that  Qy„(0)  *  Q^F)  *  0.  The  smoothed  cross-amplitude  spectral 
estimate  was  calculated  using  the  following  equation: 

-  «<“f  (5-1-7) 

where  the  smoothed  phase  spectral  estimate  is  given  by: 


Fy_  (i)  -  arc  tan 


Lr,(D 


,  0  £  i  <  F 


(5.1.8) 


and  S7x(i)  and  Lyx(i)  are  as  previously  defined.  The  smoothed  squared 
coherency  spectral  estimate  is  given  by: 


,  0  i  1  f  F 


(5.1.9) 


C,/i)C„(i) 

where:  Ayx(i)  is  the  smoothed  cross-amplitude  spectral  estimate  and 
Cyy( i)  is  the  smoothed  spectral  estimate  given  by: 

C„(i)  «  cyy(0}+  2  ^  cyy(k)w(k)Coa  ^  ]*  ,  0  j£  i  s  F 

1  (5.1.10) 

and  4  '  •  ‘ 

C^fi)  -  2  {cXM(0)+  2  ^  clx(k)w(k)Cos  y},OiiiF 

k»l  '* 

(5.1.11) 

Having  calculated  and  plotted  the  cross  amplitude  spectrum,  we  can 
detect  whether  or  not  frequency  components  in  the  vertical  incidence  sound¬ 
ings  are  associated  with  large  or  smell  amplitudes  at  the  same  frequency  in 
the  oblique  incidence  Beries.  The  estimate  of  the  phase  spectrum  of  the  two 
stochastic  realizations  helps  us  in  determining  whether  or  not  frequency 
components  in  the  vertical  incidence  Beries  are  in  phase  or  out  of  phase 
(lag  or  lead)  with  components,  at  the  same  frequency,  in  the  oblique 
incidence  series. 
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An  estimate  of  the  cross-amplitude  spectrum  and  the  phase  spectrum 
would  suffice  to  provide  a  complete  description  of  the  "behavior  of  the  two 
series.  The  square  coherency  spectrum  is  the  plot  of  the  K3  (f)  vs. 

frequency.  The  cross  amplitude  spectrum,  Ay)t(f),  is  a  measure  of  the 
covariance  which  exists  between  the  oblique  and  vertical  Incidence  soundings 
at  frequency,  f .  In  general,  the  coherency  spectrum  plays  the  role  of  a 
correlation  coefficient  with  respect  to  frequency.  Its  usefulness  lies  in 
the  fact  that  dimensions  do  not  enter  the  picture  when  the  correlation  is 
measured  with  respect  to  frequency.  Unlike  the  square  coherency  spectrum, 
the  cross  amplitude  spectrum  depends  upon  the  dimensions  of  the  oblique  and 
vertical  incidence  soundings.  This  is  the  reason  why  the  square  coherency 
spectrum  is  sometimes  preferred  over  the  cross  amplitude  spectrum,  and 
together  with  the  phase  spectrum,  will  give  a  complete  picture  of  the  cross 
correlation  behavior  of  the  oblique  and  vertical  incidence  soundings. 

We  shall,  in  what  follows,  obtain  estimates  for  the  coquadrature, 
phase,  and  cross  amplitude  spectral  estimates  using  Bartlett's  lag  window. 
These  smoothed  estimates  were  obtained  using  the  truncation  points  L  =  M  =  8, 
12,  1 6,  20,  and  24  units  for  the  cross  spectral  estimate  and  L  =  M  =  8,  12, 
l6,  24,  and  32  units  for  the  smoothed  coquadrature  spectral  estimate.  These 
truncation  points  correspond  to  decreasing  the  bandwidth  to  b  =  bt/L  *  1.5/L. 

Figure  9  shows  the  smoothed  cospectral  estimate.  Similarly,  Figure 
10  shows,  on  the  same  axes,  the  various  smoothed  quadrature  spectral  esti¬ 
mates.  It  is  clear  that  for  20  units,  the  estimates  in  both  cases,  l.e., 
vertical  and  oblique,  become  very  erratic.  As  we  mentioned  previously, 
compromising  between  bias  and  variance,  it  appears  that  for  L  =  l6  units,  we 
have  the  best  estimate  using  Bartlett's  lag  window  with  b  ■  .094  and  v  =  15 
degrees  of  freedom.  The  smoothed  phase  spectral  estimate  and  the  smoothed 
cross  spectral  estimate,  plotted  for  L  *  16,  each  on  separate  sets  of  axes 
to  enhance  the  details  of  the  series,  are  shown  in  Figures  11  and  12, 
respectively. 

The  smoothed  coquadrature,  phase,  and  cross  amplitude  spectral 
estimates  were  similarly  obtained  using  Tukey 's  lag  window  for  truncation 
points  L  *  8,  12,  l4,  1 6,  and  32.  Figure  13  displays  the  smoothed  cospectral 
estimates.  The  smoothed  quadrature  spectral  estimates  are  plotted  in 
Figure  14  for  the  same  truncation  points.  For  both  of  these  cases,  the 
estimates  became  more  erratic  as  L  is  increased  beyond  20  units.  Taking  the 
bandwidth  into  consideration,  we  choose  the  estimate  for  which  L  =  l4  units 
as  the  best  compromise  between  bias  and  variance.  Thus,  the  bandwidth 
resulted  in  b  -  1.33/1  ■’ <095  for  L  =  14  and  y  =  15  degrees  of  freedom  for 
the  Tukey  lag  window.  Decreasing  b  to  .083,  the  degrees  of  freedom  are 
decreased  considerably,  therefore,  having  chosen  L  =  14  units  will  give  the 
best  estimate  of  the  co-  and  quadrature  spectra  for  the  Tukey  lag  window. 

The  smoothed  phase  and  smoothed  cross  amplitude  spectra  were  then  plotted 
for  L  ■  15  units  to  enhance  the  details.  Figures  15  and  16  display  the 
smoothed  cross  amplitude  spectral  estimate  and  the  smoothed  phase  spectral 
estimate  respectively,  using  the  Tukey  lag  window  for  L  ■  14. 

A  similar  analysis  was  performed  to  obtain  smoothed  estimates  for 
the  co-  and  quadrature  spectra  using  Parzen's  lag  window  for  L  «=  8,  16,  20, 
24,  and  32  units.  Figures  17  and  18  display  the  above  smoothed  estimates. 
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IFD  QUADRATURE  SPECTRAL  ESTIMATES 
USING  THE  TUKEY  LAG  WINDOW 
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Figure  16. 

SMOOTHED  PHASE  SPECTRAL  ESTIMATE 
OSINO  THE  TOKET  LAG  WINDOW 

QUADRATURE  SPECTRAL  E8T 
D  THE  PARZEE  LAG  tflHDOW 
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The  bandwidths  for  the  Parzen  lag  window  are  given  by  b  *  1.86/l  and  the 
degrees  of  freed  cm  can  be  obtained  from  v  *=  1 66  b.  For  values  of  2k,  the 
estimates  become  quite  erratic  and  the  bandwidth  and  degrees  of  freedom  are 
decreased.  However,  the  decrease  In  bandwidth  from  .093  to  .078  for  L  =  20 
and  2k  units,  respectively,  is  not  worth  the  decrease  in  variance.  Hence, 
we  choose  L  *  20  as  our  best  estimates  of  the  co-  and  quadrature  spectra. 

This  gives  a  bandwidth  of  b  =  .093-  Figures  19  and  20  show  the  smoothed 
phase  and  cross  amplitude  spectral  estimates,  respectively,  for  L  =  20,  using 
the  Parzen  lag  window,  along  with  their  corresponding  bandwidth. 

To  compare  the  estimates  obtained  for  the  Bartlett,  Tukey,  and 
Parzen  lag  wlndcvs,  the  estimates  corresponding  to  the  best  value  of  L  (chosen 
for  each  window)  were  plotted  on  the  same  axes,  see  Figure  21.  The 
estimates  for  the  co-  and  quadrature  spectra  coincided  almost  exactly.  Bach 
estimate  has  15  degrees  of  freedom  for  the  autospectrum  analysis.  The  Parzen 
lag  window  has  a  slightly  smaller  bandwidth  than  the  others.  It  was  difficult 
to  choose  the  best  window,  but  since  Parzen* s  lag  window  for  L  =  20  units  gave 
a  bandwidth  of  .093;  we  chose  it  as  the  best  smoothed  estimate  of  the  co-  and 
quadrature  spectra.  The  smoothed  estimates  for  the  phase  and  cross  amplitude 
spectra  are  also  best  represented  by  this  lag  window  for  L  =  20  units.  The 
smoothed  sample  cospectral  estimate  estimates  the  covariance  due  to  the  in- 
phase  components.  There  is  a  peak  at  about  0.2  cycles  per  second  which 
corresponds  to  the  peaks  in  the  autospectra  due  to  the  fact  that  the  variance 
is  a  special,  case  of  the  covariance.  At  frequencies  less  than  0.125  cycles 
per  second,  the  covariance  between  the  vertical  and  oblique  incidence  realiza¬ 
tions  is  reasonably  small  and  constant  over  the  frequency  range  0  to  .125 
cycles  per  second.  The  variance  at  most  frequencies  in  the  autospectra  is 
fairly  large.  However,  the  covariance  distribution  of  the  in-phase  components 
of  the  filtered  ionospheric  series  is  small,  end  therefore,  the  series  in- 
phase  components  are  not  very  dependent.  The  larger  value  of  the  sample 
cospectrum  is  near  .375  cycles  per  second  corresponding  to  variance  values  of 
autospectra  of  about  10  at  the  same  frequency  for  the  Parzen  lag  window. 

L  •  20  units,  and  hence,  the  correlation  is  small  as  will  be  verified  by  the 
square  coherency  spectral  estimate.  c 

‘  The  smoothed  quadrature  spectral  estimate  estimates  the  covariance 
of  the  out-of -phase  component  a  of  the  two  filtered  time  series.  This  also 
shows  that  there  is  small  covariance  between  the  out-cf -phase  components  of 
the  two  filtered  series  and, hence,  that  they  are  not  very  correlated.  The 
largest  value  is  O.Okl  for  the  chosen  lag  window  (Parzen,  L  =  20)  aond  the 
smallest  value  is  -.02?.  There  is  little  or  no  covariance  exhibited  at  all 
in  the  range  0  to  0.25  cps.,  but  the  out-of -phase  ccnq>onents  begin  to  vary  in 
a  sinusoidal  manner  at  high  frequencies.  f 

The  smoothed  phase  spectral  estimate  estimates  the  phase  angle  in 
radians  by  which  one  filtered  time  series  leads  or  lags  the  other.  At 
frequencies  0  to  .0625  cps.,  the  phases  are  approximately  the  same  (phase 
spectral  estimate  is  near  0).  At  frequencies  between  .0625  cps.  and  0.13  cps. 
the  in-phase  components  of  the  two  time  series  lag  the  out-of -pliase  components 
very  6lightly.  From  0.13  cps.  to  approximately  0.27  cps.,  the  out-of -phase 
components  lag  the  in-phase  components.  From  0.27  cps.  to  0.35  cps.,  the  in- 
phase  is  lagging,  and  from  0.35  cps.  to  0.5  cps.,  the  out-of -phase  components 
lag  the  in-phase  components  of  the  two  time  series.  Since  the  phases  alter¬ 
nate  leading,  there  is  no  reason  to  assume  or  conclude  that  one  time  series 
leads  or  lags  the  other  at  all  frequencies. 
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Figure  19.  SMOOTHED  PHASE  SPECTRAL  ESTIMATE 
USING  THE  PARZEN  LAO  WINDOW 
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The  smoothed  cross  amplitude  spectral  estimate  shows  whether  or  not 
the  amplitude  of  the  components  at  a  particular  frequency  in  one  time  series 
is  associated  with  a  large  or  small  amplitude  of  the  same  order  at  the  same 
frequency  in  the  other  time  series.  The  spectral  density  of  the  autospectra 
shows  that  |the  variance  is  shout  10  in  both  filtered  x,  and  yt  series  so  that, 
at  frequencies  from  0.3  cps  to  0.4  cps,  the  amplitude  of  the  components  of 
one  time  series  is  associated  with  corresponding  large  or  small  amplitudes  at 
the  same  frequency  in  the  other.  Again,  this  seems  to  Indicate  that  covari¬ 
ance  between  the  component  amplitudes  is  near  zero  at  other  frequencies.  In 
Figure  21,  the  best  smoothed  cospectral  estimate  is  displayed  and,  in  Figure 
22,  the  best  smoothed  quadrature  spectral  estimate  is  shewn. 

6.  SlPOftHT  AMD  OOSCLPSIOKS 

A  plot,  see  Figure  21,  is  given  for  the  selected  best  estimates  of 
the  spectral  densities  for  each  of  the  three  lag  windows,  namely,  those  of 
Bartlett,  Tukey  and  Parzen.  Although  the  truncation  is  different  for  each 
lag  window,  the  bandwidth,  degrees  of  freed  cm,  and  confidence  intervals  are 
almost  identical.  Thus,  it  is  quite  difficult  to  choose  which  lag  window 
glTSS  the  best  smoothed  estimate  of  the  spectral  density  function.  However, 
calculating  the  approximate  bias  for  each  of  the  above  lag  windows,  we  found 
that  the  bias  for  Parzen 's  lag  window  is  sanevbat  smaller  than  that  for  the 
Tukey  and  Bartlett  lag  windows.  That  is: 

Furthermore,  the  variance  ratio,  that  is,  the  proportional  reduction  in  vari¬ 
ance  as  the  result  of  using  the  smoothed  estimator  as  coopered  to  the  sample 
spectrum  estimate,  is  approximately  equal  to  0.128.  On  the  basis  of  these 
two  criteria,  we  choose  the  best  estimate  of  the  spectral  density  using 
Parzen* s  lag  window.  In  addition,  the  bandwidth  of  this  lag  window  is 
slightly  smaller  than  that  of  the  Tukey  and  Bartlett  lag  windows.  Therefore, 
the  best  estimate  of  the  spectral  density  of  the  average  oblique  incidence 
soundings  was  obtained  using  Parzen’ s  lag  window  for  L  »  20  units.  This 
value  of  L  resulted  in  a  95%  confidence  Interval  width  of  2.25  with  15  degrees 
of  freedom,  and  a  bandwidth  of  b  •  .093.  The  bandwidth  is  less  than  l/5  of 
the  total  frequency  range  over  which  the  spectral  density  function  is  esti¬ 
mated.  Since  we  are  detecting  peaks  vlth  widths  of  .093  or  more,  the  two 
peaks  appearing  in  the  estimated  spectral  density  at  frequencies  f  -  3/l6  cps 
and  f  ■  3/8  cps  are  valid  peaks,  and  they  should  be  taken  into  consideration 
in  interpreting  the  behavior  of  the  average  oblique  incidence  soundings.  The 
process  generating  the  resultant  soundings  exhibits  large  variance  around 
these  two  frequencies  for  the  filtered  data.  Such  information  should  be 
taken  into  account  in  the  design  of  the  system.  Frequencies  below  f  ■=  .125 
cps  on  the  spectral  estimates  gives  the  lovest  power,  that  Is,  the  least 
variance. 


The  Parzen  lag  window  for  L  •  20  units  and  b  »  .093,  vas  used  to 
obtain  smoothed  estimates  of  the  co-  and  quadrature  spectra.  The  smoothed 
estimates  of  the  phase  and  cross  amplitude  spectra  were  also  obtained  using 
the  same  lag  window  and  L  ■  20  units. 

The  smoothed  sample  spectral  estimate  estimates  the  covariance  due 
to  the  in-phase  components.  There  is  a  peak  at  about  .20  cps  and  one  at 


•375  cps  which  correspond  to  the  peaks  in  the  autospectra. 


frequencies 
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less  .125  cps ,  the  covariance  Is  reasonably  snail  and  approximately 
constant  over  the  frequency  range  of  0  to  .125  cps.  The  variance  at  most 
frequencies  In  the  autospectra  Is  fairly  large.  However,  the  covariance 
distribution  of  the  in-phase  components  of  the  two  filtered  series  Is  snail 
and  has,  due  to  the  soundings  series,  In-phase  components  that  are  not  very 
dependent.  The  larger  value  of  the  sample  spectra  is  near  .375  cps,  corre¬ 
sponding  to  variance  values  of  the  autospectra  of  shout  10,  at  the  sane 
frequency,  using  the  Parzen  lag  window  for  L  *  20  units.  Hence,  the  correla¬ 
tion  between  the  average  oblique  and  vertical  Incidence  soundings  Is  snail 
as  was  verified  by  the  squared  coherency  spectral  estimate. 

The  smoothed  quadrature  spectral  estimate  estimates  the  covariance 
of  the  out-of-phase  components  of  the  filtered  oblique  and  vertical  incidence 
soundings.  It  shewed  that  the  covariance  between  the  out-of-phase  components 
of  the  two  filtered  series  is  small,  and  hence,  that  they  are  not  very  corre¬ 
lated.  The  largest  value  is  .04l  for  the  chosen  lag  window,  and  the  smallest 
value  is  -.025*  There  is  little  or  no  covariance  exhibited  In  the  range  from 
0  to  .25  cps,  but  the  out-of -phase  components  begin  to  vary  In  a  sinusoidal 
■aimer  at  higher  frequencies. 

The  smoothed  phase  spectral  estimate  estimates  the  phase  angle  in 
radians  by  which  one  filtered  time  series  leads  or  lags  another.  At  frequen¬ 
cies  0  to  .0625  cps,  the  phases  are  approximately  the  same;  that  is,  the 
phase  spectral  estimate  is  near  zero.  At  frequencies  between  .0625  cps  and 
.13  cps,  the  in-phase  components  of  the  two  time  series  lag  the  out-of -phase 
component  a  very  slightly.  From  .13  cps  to  approximately  .27  epe,  the  out-of- 
phase  components  lag  the  in-phase  components.  From  .27  cps  to  .35  cps,  the 
in-phase  components  are  lagging,  and  from  .35  cps  to  .50  cps,  the  out-of¬ 
phase  components  lag  the  in-phase  components  of  the  two  time  series  (the 
average  oblique  and  vertical  incidence  soundings).  Since  the  phase  la 
alternately  leading,  there  is  no  reason  to  assume  or  conclude  that  one  time 
series  leads  or  lags  the  other  at  all  frequencies. 

The  smoothed  cross  amplitude  spectral  estimate  shews  vhether  or  not 
the  ajplitude  of  the  c deponents  at  a  particular  frequency  in  one  time  series 
Is  associated  with  a  large  or  small  amplitude  of  the  same  order,  at  the  same 
frequency,  in  the  other  time  series.  The  spectral  density  of  the  auto- 
spe^tra  shews  that  the  variance  is  shout  IQ  in  both  the  filtered  average 
oblique  incidence  and  filtered  average  vertical  incidence  soundings,  so  that, 
at  frequencies  from  .30  cps  to  .40  cps,  the  amplitude  of  the  components  of 
one  time  series  is  associated  with  corresponding  large  or  small  amplitudes 
(at  the  same  frequency)  of  the  other.  Again,  this  indicates  that  the  covari¬ 
ance  between  the  component  amplitudes  is  near  zero  at  other  frequencies.  In 
Figures  21  and  22,  we  displayed  the  best  smoothed  estimates  of  the  cospectral 
and  the  quadrature  spectral  estimates. 

In  order  to  obtain  a  better  representation  of  the  important  peaks 
and  a  confidence  interval,  the  square  coherency  was  calculated  and  plotted 
(see  Figure  23)  on  the  truncation  scale,  Jenkins  and  Watts  [1968],  given  by: 

y(f)  •  arct an  |  |  • 
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A  95$  confidence  interval  was  obtained  using  the  following  expression: 

y,„(f)  =  ±  1.96  /L/^N 

-  ±  1.96  ^0/2(1. 86 )83~«  ±  .499  , 

and  is  shown  on  the  graph  of  the  smoothed  squared  coherency  spectrum.  This 
squared  coherency  spectral  estimate  gives  the  correlation  between  the  average 
oblique  incidence  soundings  and  the  average  vertical  incidence  soundings  for 
the  60  Km  experiment.  At  low  frequencies,  we  have  almost  perfect  correlation 
between  the  two  filtered  series,  but  this  dampens  out  near  zero  at  about  .25 
cps  ,  and  again  at  .50  cps.  Furthermore,  it  never  becomes  greater  than  .33 
J  cps.  This  frequency  range  shows  virtually  no  correlation.  Between  these  two 
■  frequencies,  .25  cps  and  50  cps,  the  squared  coherency  is  near  zero  which 
-  indicates  that  the  noise  level  is  high  in  the  filtered  series  for  components 

i  of  this  frequency.  This  is  consistent  with  the  results  obtained  by  the  auto¬ 

spectra  analysis,  that  is,  the  distribution  of  power  or  variance  is  larger 
i  at  high  frequencies  (between  .25  cps  and  .50  cps  )  .  At  low  frequencies,  the 
squared  coherency  is  high,  which  indicates  low  noise  or  variance  in  the  auto¬ 
spectra  for  the  corresponding  frequencies  and  again,  this  is  the  same  result 
obtained  in  the  autospectra  analysis. 


REFERENCES 


D'Accardi,  R.  J.,  C.  P.  Tsokos,  and  R.  A.  Kulinyi  [1971].  "Statistical  Models 
for  HF  Ionospheric  Forecasting  for  Field  Army  Distances."  Vol. 
Proceedings  of  the  17th  Confereuce  on  Design  of  Experiments  and  Testing 
in  Army  R&D. 

Ames,  J.  W.  and  R.  D.  Egan  [1967].  "Digital  Recording  and  Short  Term 
Prediction  of  Oblique  Ionospheric  Propagation."  IEEE  Transcript, 

Vol.  AP-15. 

Krause,  G.  E.  et  al  [1970].  "Field  Test  of  a  Near  Real  Time  Ionospheric 
Forecasting  Scheme  (60  Km)."  ECOM  Technical  Report  #33^5 • 

Jenkins,  G.  M.,  and  D.  G.  Watts  [1963].  Spectral  Analysis  and  Its  Applica¬ 
tions.  Holden-Day,  San  Francisco. 

Box,  G.  E.  P.  and  G.  M.  Jenkins  [1970].  Time  Series  Analysis,  Forecasting 
and  Control.  Holden-Day,  San  Francisco. 

Kendall,  M.  G.  and  A.  Stuart  [1966].  The  Advanced  Theory  of  Statistics. 

Vol.  3.  Griffin,  London.  “  ~  - 

Blackman,  R.  B.  and  J.  W.  Tukey  [1959].  Measurement  of  Power  Spectra  from 
the  Point  of  View  of  Communication  Engineering.  Dover,  New  York. 


V 


U.  S.  Army  Materiel  Systems  Analysis  Agency 
Aberdeen  Proving  Ground,  Maryland 

MAXIMUM  LIKELIHOOD  ESTIMATION  PROCEDURES  IN  RELIABILITY  GROWTH 

Larry  H.  Crow 


Introduction 


A  development  program  is  generally  recognized  as  being  a  necessity  for 
most  systems  since  they  usually  exhibit  initial  design  and  engineering 
deficiencies.  Attempts  are  made  during  the  development  program  to  find  and 
remove  these  deficiencies  to  a  point  where  certain  levels  of  performance  with 
respect  to  reliability  and  other  requirements  are  met. 

The  development  of  a  system  usually  evolves  as  a  repeated  process  of 
system  examination  and  testing,  determination  of  system  failure  modes,  and 
design  and  engineering  changes  as  attempts  to  eliminate  these  modes.  Because 
of  the  scarcity  of  data,  it  is  often  a  difficult  task  for  one  to  obtain 
directly  good  estimates  of  the  progress  of  the  development  program  and  to 
project  future  progress.  In  this  regard,  program  managers  generally  need 
specialized  techniques  and  methodology  which  will  allow  them  to  evaluate  the 
progress  of  the  development  program  from  a  limited  amount  of  test  data.  The 

c  1 

area  of  reliability  growth  modeling  is  a  management  tool  directed  toward  this 
need  of  the  program  managers. 


It  is  usually  assumed  that  the  system  reliability  will  increase  during 
the  development  program  and,  thus,  mathematical  models  describing  this 
phenomenon  have  come  to  be  called  "reliability  growth"  models.  Most  of  the 
reliability  growth  models  considered  in  the  literature  assume  that  a  math¬ 
ematical  formula  (or  curve),  as  a  function  of  time,  represents  the  reliability 
of  the  system  during  the  development  program.  It  is  commonly  assumed,  also, 
that  these  curves  are  nondecreasing.  That  is,  once  the  system's  reliability 
has  reached  a  certain  level,  it  will  not  drop  beJow  this  level  during  the 
remainder  of  the  development  program.  It  is  important  to  note  that  this  is 


equivalent  to  assuming  that, any  design  or  engineering  changes  made  during  the 


Preceding  page  blank 
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development  program  do  not  decrease  the  system's  reliability. 

The  central  purpose  of  most  reliability  growth  models  includes  one  or  both 
of  the  following  objectives: 

•  Inference  on  the  present  system  reliability; 

•  Projection  on  the  system  reliability  at  some  future  development 

time. 

This  paper  will  consider  a  commonly  used  reliability  growth  model  proposed 
by  Duane  [1].  For  this  model  maximum  likelihood  estimates  of  the  unknown! 
parameters  will  be  given  along  with  appropriate  confidence  interval  and 
hypotheses  testing  procedures. 


....  . -  V  ■  - 

Observe  that  for  0*1,  r(x)  is  constant.  If  8>1  (B<1)  then  r(x)  is 
increasing  (decreasing)  which  implies  that  the  system  is  wearing  out  (improv¬ 
ing)  with  age. 

The  above  definition  of  failure  rate  is  appropriate  when  one  is  interested 
primarily  in  time  to  first  failure.  However,  during  a  development  program  the 
system  is  repaired  or  modified  after  each  failure  and  tested  further. 

The  failure  rate  of  a  (complex)  repairable  system  may  be  defined  by 

r(x)dx  *  (unconditional)  probability 
that  a  system  of  age  x  will 
fail  in  (x,x+dx) . 

This  probability  is  independent  of  the  failure  history  of  the  system  during 
[0,x].  Again  if  B>1  (6<1)  then  r(x)  is  increasing  (decreasing)  which  implies 
that  the  system  is  wearing  out  (improving)  with  age. 

Examples 

1.  Constant  failure  rate 

r(x)  ■  X,  X  >  0,  x  >  0. 

2.  ffeibull  failure  rate 

x(x)  *  A0X®”1,  X  >  0,  0  >  0,  x  >  0. 

C  '  C*  c  f.-  C  ,  , 

c  "  ‘  '  '•  G  '<  '  '  v  f  c ;  '  c  e  -c 

The  Duane  Reliability  Growth  Model  [1]  is  usually  written  as 

r(x)  ■  (1-a) Ax~a 

x  >  0,  X  >  0,  0  <  8  <  1,  where  r(x)  is  the  failure  rate  of  a  repairable  system. 
Replacing  -a  by  0-1,  we  see  that  the  Duane  Model  and  the  Weibull  repairable 
system  failure  rate  model  are  the  same.  For  a  system  with  a  constant  failure 
rate  for  a  fixed  configuration,  this  model  is  equivalent  to  assuming  that  the 
mean  time  between  failure  (MTBF)  of  the  system  at  time  x  is 

-1  x1'* 

M(x)  -  (r(x)]  1  -  ^ - . 

X0 

That  is,  the  MTBF  is  proportioned  to  x*“®. 
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Maximum  Likelihood  Estimates  of  X  and  B 


Suppose  K  systems  have  each  experienced  T  units  of  operation  since  the 
development  program  began.  Let  (T)  be  the  random  number  of  failures 
observed  for  the  r-th  system,  r«l,...,K.  Let  X-r  be  the  age  of  the  r-th  system 
(regarding  the  age  at  the  beginning  of  development  as  0)  at  the  i-th  failure, 
i*lj. •• jN^CT) (r*l i* • • 


®Y  V  Y  Y 

*11  *21  *31  *41 

•  •  •  • 

•  •  • 

• 

\C0.1 

• 

T 

Xj2  X22  X32  X42 

•  •  • 

X 

*52  * 

•  •  XN2(T),2 

• 

T 

®  x  x  X 

*1K  *2K  *3K 

*  *  \(T),K 

T 

The  maximum  likelihood  estimate  (MLE)  of  £  is 


l  n rcn 

r*l 

N  (T) 


K  r 

l  l 

r*l  i*l  ”ir 


T 

rr 


The  MLE  of  X  is 


K  T 


6 


(All  logs  are  with  respect  to  base  e<) 


Example 

Suppose  K=3  systems  were  tested  for  time  T*200.  This  experiment  was 
simulated  on  a  computer  when  A*0.6  and  8*0.5.  These  results  are  given  in 
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Table  1  where  Xiy  is  the  age  of  the  r-th  system  at  the  i-th  failure. 
From  this  simulation  the  MLE  of  6  is 


and  the  MLE  of  X  is 


6  -  0.61S 


0.461. 


The  Duane  Model  states  that  if  development  of  the  system  is  stopped  at 
T*200  hours  of  testing,  then  the  times  between  failures  of  the  system  there¬ 
after  will  follow  the  exponential  distribution 


F(x)  -  1-e 


■x/MCT) 


x  >  0,  where 


1-6 


M(T)  -  [r  (T)  ] " 1  -  ^2 


Based  on  200  hours  of  .testing  the  MLE  of  M(T)  is 

M(200)  «  -  27.12. 

XB 

■  .  •-  '  t  ,  C  •  C  .  .  C  C  '  c 

t  t.  6  -c_.  O  c  •-  -  C  -  c  c  '  •  C 

‘  If  development  is  stopped  at,  say,  T»300  hours  of  testing  the  model 
states  that  future  times  between  failure  will,  also,  follow  the  exponential 
distribution  but  with  mean 


1-6 


M(300)  .  £5g 


Based  on  200  hours  of  testing  the  projection  of  the  MTBF  at  300  hours  of 
testing  is 


M(300) 


(300) 


1-8 


31.70. 


XB 
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TABLE  1 


Simulated  Data  for  K*>3  Systems  Operated  For  Time 
T»200  when  X  *  0.6  and  1  «  0.5 


il 


.3 
4.4 
10.2 
23.5 
23.8 
26.4 
74.0 

77.1 

92.1 
197.2 


i2 


0.1 

5.6 

18.6 

19.5 
24.2 

26.7 
45.1 

45.8 

75.7 

79.7 

98.6 
120.1 
161.8 
180.6 
190.8 


n2Ct) 


15  T 


•  26.434 


N(T)  -  NjCn+J^Cn+NjCT)  .  36 


3  Nr(T)  _ 

I  I  lo8r-*  S8-493 

r*l  i»l  *ir 


NCT)  n 

3  Nr(T)  0,615 

I  I  l»llf 

r*l  i-1  xir 


X  -  m.  .  0.461 


_ i3_ 

8.4 

32.5 
44.7 
48.4 

50.6 

73.6 

98.7 

112.2 

129.8 
136.0 

195.8 


MjCD  -  11 


11  T 
L  1°8  v — 
i«l  Ai3 

-  12.398 


V.  ...A 
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Hypotheses  Tests  on  8 


Let  N(T)  be  the  total  number  of  failures  for  the  K  systems.  That  is. 


N(T)  -  N1(T)+N2n>...+NKCT). 


Conditioned  on  N(T)=n  (n  a  fixed  integer),  the  random  variable 


K  Nr^ 


20  l  l 

rSl  i*l  \  Xir  / 


has  the  Chi-Square  distribution  with  2n  degrees  of  freedom. 


This  result  may  be  used  in  the  usual  fashion  to  test  hypotheses  on  the 
true  value  of  0. 


c.  4  ,  c 


When  n  is  moderate  in  size  then  one  may  use  the  fact  that 


K 


fl  l  W; r -Vn 

r-1  i«l  \xir  / 


/T 


is  approximately  normally  distributed  with  mean  0  and  variance  1  to  test 
hypotheses  on  the  true  value  of  0. 
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respectively,  where  0  is  the  MLE  of  0,  and  is  the  j  -th  percentile  for 
the  normal  distribution  with  mean  0  and  variance  1. 

Example 

Consider  again  the  simulated  results  presented  in  Table  1  when  K=3  and 
T-200.  ' 

The  MLE  of  0  was  computed  to  be 

0  -  0.615. 

Conditioned  on  N=36,  90  percent  approximate  confidence  bounds  on  0  are 

•  1  <45 

LCB  -  0(1-  -i-—-)  -  0.446 

UCB  -  0(1+  -  0.784. 

Hypotheses  Tests  and  Confidence  Bounds  on  1  (B  known) . 

Suppose  0  is  equal  to  some  known  value  0Q,  say,  and  that  K  systems  have 
operated  for  time  T  during  development.  Then  the  random  number  of  failures, 
N(T),  for  the  K  systems  during  [0,T]  has  the  Poisson  distribution  with  mean 

c  i 

e  ■  kxtSo. 

This  result  may  be  used  to  test  hypotheses  or  construct  confidence  bounds 
on  A  when  0  is  known.  ‘  .  „ ' 

Example 

Assume  K»3  systems  were  operated  for  time  T=200  and  N(T)=36  failures  were 
observed.  Suppose,  also,  that  0  is  known  to  equal  0.5.  Two-sided  95  percent 
confidence  bounds  on  8  are  (25.1,  49.8).  Consequently,  two-sided  95  percent 


•That  is,  our  assurance  is  at  least,  instead  of  exactly  equal  to,  a  specified 
value  that  the  parameter  of  interest  will  lie  within  the  stated  bounds. 


'V 


respectively.  Consequently,  (1-a) (l-y)xlOO  percent  lower  and  upper  (conserva 
tive)  confidence  bounds  on  the  MTBF  at  time  T,  M(T),  are 


M 

1 


M 

2 


KT 


KT 


Example 

Consider  again  the  simulated  results  presented  in  Table  1  when  K=3  and 
T*»200.  Approximate  9S  percent  upper  confidence  bound  on  8  is 


0  «  0.784. 
2 


Also,  based  on  N=36  failures,  97.5  percent  upper  confidence  bound  on 


0  -  KAT8 

is  ''  ‘  ,  ■  -  .  •  = 


0  -  49.8. 

,  ,  .  2  .  , 

C 

C  1  c 

Hence,  (93.0)x(97.5)  =  92.625  percent  (conservative)  upper  confidence  bound  on 

c  c 

the  failure  rate 


r(T)  «  AST8'1 
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i‘ 


^  +  1 


at  time  T  is 


r 

* 

t 


r  *  .065. 
2 


Consequently,  92.625  percent  (conservative)  lower  confidence  bound  on  the  MTBF 

MCT)  -  jlyj- 


at  time  T  is 


M  »  i-  =  15.385. 
1  r2 


l 


r 

i 
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MODIFIED  PROPAGATION  OF  ERRORS  WITH  APPLICATIONS 
TO  MAINTAINABILITY  AND  AVAILABILITY 

Paul  C.  Cox 

Quality  Assurance  Office 
White  Sands  Missile  Range,  New  Mexico 


ABSTRACT.  A  modification  of  the  conventional  method  of  "Propagation 
of  Errors"  is  proposed.  This  modified  method  promises  to  have  numerous 
applications,  is  frequently  more  easily  applied  than  conventional  propaga¬ 
tion  of  errors,  and  for  a  few  functions  of  random  variables  which  have  been 
studied,  provides  improved  approximations  of  confidence  limits  over  con¬ 
ventional  propagation  of  errors  as  well  as  over  other  well  known  methods. 
Modified  Propagation  of  Errors  (MPE)  is  described,  applications  to  "mean 
time  to  repair"  and  "availability"  are  illustrated,  and  the  extent  of 
error  caused  by  using  MPE  is  discussed.  Finally,  to  illustrate  another 
application,  MPE  is  used  to  approximate  confidence  limits  for  system 
reliability  from  confidence  limits  for  component  reliability. 

1.  INTRODUCTION  . 

a.  The  method  of  "Propagation  of  Errors,"  by  which  the  variance  of 

a  function  of  variables  is  determined  from  the  variances  of  the  individual 
variable  is  well  known.  After  obtaining  the  variance,  it  is  then  possible 
to  at  least  approximate  errors,  confidence  limits,  and  levels  of  signifi¬ 
cance  for  the  function  of  variables.  This  discussion  will  be  centered 
around  determining  confidence  limits  for  a  function  of  variables. 

b.  In  the  event  of  a  linear  function  of  independent,  random,  normal 

variables,  the  function  is  also  normal;  and  there  is  no  error  in  the 
variance  obtained  by  propagation  of  errors,  assuming  there  is  no  error  in 
the  individual  variances.  It  follows  that  the  concept  of  propagation  of 
errors  is  very  useful  when  evaluating  a  linear  function  of  independent 
sample  means.  <  ;  " 

c.  Propagation  of  errors  is  frequently  used  when  the  function  is  not 
linear  and/or  the  variables  are  not  normal.  Since  the  application  of 
propagation  of  errors  is  usually  followed  by  an  assumption  of  normality 
for  the  function,  the  procedure  can  be  expected  to  result  in  an  error  in 
the  confidence  limits  approximated  by  this  method.  It  is  the  purpose  of 
MPE  to  broaden  the  application  of  propagation  of  errors  and  to  reduce 

the  error  when  certain  variables  are  not  normal.  The  procedure  proposed 
to  modify  the  propagation  of  errors  has  the  following  characteristics: 
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(1)  Simple  In  concept  and  easy  to  apply 


(2)  Can  be  applied  to  a  large  variety  of  functions  of  Independent 
variables — there  are  many  functions  in  which  MPE  can  easily  be  applied 
but  for  some  of  these  functions  it  may  be  extremely  difficult  to  apply 
conventional  propagation  of  errors. 

(3)  In  almost  all  areas  studied,  MPE  provided  results  which  were  as 
good  or  better  than  those  provided  by  conventional  propagation  of  errors. 

In  numerous  cases,  MPE  provided  almost  errorless  estimates  while  large 
errors  were  noted  when  using  conventional  propagation  of  errors. 

d.  While  MPE  is  applicable  to  a  wide  variety  of  functions  of  indepen¬ 
dent  variables  (the  main  requirement  being  that  confidence  limits  can  be 
obtained  for  each  variable),  the  extent  of  error  must  be  determined  on  a 
case-by-case  basis.  A  number  of  applications  have  been  studied.  For 
example,  the  old  problem  of  obtaining  confidence  limits  for  a  system  if 
the  confidence  limits  for  the  components  are  known  is  discussed  in 
Section  10.  However,  this  paper  is  primarily  concerned  with  the  application 
of  MPE  to  approximating  confidence  limits  for  mean  time  to  repair  (MTTR) 
and  availability  (A).  Fortunately  certain  recent  reports,  reference  e-h, 
have  provided  tables  of  exact  confidence  limits,  thus  providing  a  means 
for  determining  the  error  when  approximating  confidence  limits  by  MPE. 

2.  THE  METHOD. 

a.  MPE  Is  applicable  to  virtually  any  function  of  any  type  of  random, 
independent  variables,  as  long  as  confidence  limits  can  be  obtained  for 
all  of  the  random  variables  within  the  function.  The  method  of  MPE  will 

be  described  and  compared  with  conventional  propagation  of  errors.  The 
two  methods  will  be  Illustrated  using  a  linear  function  y  of  three  random, 
normal  variables  (xj,  X2,  and  X3),  and  two  sided  90%  confidence  limits  for 
the  mean  y^. 

b.  The  method  of  conventional  propagation  of  errors. 

Let  y  -  f(xi,  x2,  x3,...,  xk) 
and  y  -  f(xlt  x2,...,  x^ 
where  y  is  some  type  of  average 


*  ■ 


2  + 


fe)  a*2+"*+  fe) 


V  *k 
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if  y  is  a  linear  function: 


y  -  aixj  +  a2x2  + 

u  •  aiP  +  a2u  +•••+  a  u  and 

y  *1  *2  *  *k 

y  -  ajXj  +  a2x2  +  ...  +  also 


°V  *  al°,  +  a20x  +***+ 

”  *1  2  *■  Ti 


s2  m  a?s2  +  a?s2  +•••+  a 2 a2 
y  1  Xj  2  x2  k  xk 

«2  -  ~-.J  +^-s2  +  +i»!  ■ 

y  °1  *!  n2  *2  \  *k 


and  the  values  of  y,  the  appropriate  variance,  and  the  assumption  of 
normality  are  used  to  determine  (or  approximate)  the  desired  confidence 
limits. 

c.  The  method  of  modified  propagation  of  errors  (MPE), 


dl  -  (Hr)  *i +  (Hr)  ii  *—*  (Hr)2jk 


where  '=  '  --->  i. 

hi  -  (ycl)i  -  xi  -1  |  ■  -  c 

m  xi  -  0.cl)i, 

and  where  X£  is  the  desired  average  (mean,  ratio,  median,  etc.);  (ycl)i 
and  (lcDi  are  the  upper  and  lower  confidence  limits  respectively 
associated  with  the  random  variable  X£. 

^MPE  replaces  a  with  an  interval  in  the  propagation  of  errors  formula. 
Another  example  of  the  use  of  this  concept  may  be  found  on  P.  91  in 
reference  k. 


..... 


■ 

m  ^ 


Then:  ;  " 

Upper  confidence  limits  ■  y  +  d^ 

Lower  confidence  limits  ■  y  -  d^. 

d.  An  example  of  obtaining  confidence  limits  for  the  mean  when  y 
is  a  linear  function  of  xi,  x2,  . x^. 

Given:  y  ■  3xj  +  4x2  +5x3.  The  x's  are  random  normal 
variables  and  mutually  independent. 


Xj  -  10 

a  -4 
*1 

nl 

-  9 

x2  »  12 

0  -  5 

*2 

n2 

■  16 

X3  *  16 

a  ”  6 
*3 

n3 

-  25 

To  find  90%,  2  sided  confidence  limits  for  p^. 

(1)  Using  conventional  propagation  of  errors.  j 

y  -  3*10  +  4*12  +  5*16  -  158 

ai  -  |*16  +  If*  25  +  — f*36  -  77 
y  9  16  25 

o_  -  8.7750  '  1  .*  1  I  ° 

y 

90%,  2  sided  confidence  limits:  \ 

9  t 

t 

158  -  1.645*8.775  <  p  <  158  +  1.645*8.775  I 

"  y  ,  v  ■'  '  ,  % -■ 

i  ' 

143.565  <  u  <  172.435  !  ;  ^ 

;  t 

(2)  Using  MPE 

ha  -  tj  -  1.645(4/3)  -  2.1933 
h2  m  %2  m  1.645(5/4)  ■  2.0563 
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1*3  ■  ^3  "  1.645(6/5)  ■  1.9740 

-  ^<3*2.1933)2  +  (4*2.0563)2  +  (5«1.9749)2 

-  ^208.3626  -  14.435 

90Z  2  sided  confidence  limits: 

158  ±  14.435 
143.565  i  uy  i  172.435. 


e.  In  the  above  example,  the  confidence  limits  are  exact  and  It 
really  made  no  difference  whether  the  conventional  or  MPE  method  was 
used.  This  Is  because  y  Is  a  linear  function  of  the  x's  and  because 
each  x  Is  normally  distributed.  The  advantages  of  MPE  become  evident 
when  the  variables  are  not  normal,  the  function  is  not  linear,  and  the 
confidence  limits  are  only  approximated. 


3.  APPLICATION  OF  MPE  TO  OBTAIN  CONFIDENCE  LIMITS  FOR  AVAILABILITY. 

a.  Assume  that  time  to  failure  is  distributed  as  the  expon  ntlal, 

then  MTBF  is  distributed  as  the  x2«  Assume  that  down  time  is  distributed 
as  the  log  normal.  .  •  '  5  '«  ■-  c-  - 

b.  There  appears  to  be  no  known  solution  to  the  problem  of  obtaining 
confidence  limits  for  availability,  using  the  above  assumptions.  The 
following  exceptions  are  known: 

(1)  References  c  and  d  provide  tables  and  procedures  for  confidence 
limits  for  availability  under  the  assumption  that  o2  is  known.  The 
tables  provided  by  reference  c  are  brief  and  may  require  involved  inter¬ 
polation. 

(2)  Reference  i  provides  a  solution  if  time  to  failure  is  distributed 
as  the  exponential. 

c.  From  2e,  Appendix  A, 
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B  -  exp[ln(l/vy)  +  (1/2)ct|  +  uz] 

B  -  exp[ln(l/y)  +  (1/2) s|  +  z] 

ln(l/y)  +  (l/2)a|/2  +  2  «  -2.3026  +  0.3750  +  0.9163  -  -1.0113 
The  first  step  Is  to  obtain  estimates  and  90Z  2  sided  C.L.  for 

In  (l/-iy)  +  (1/2)02/2  +  p2 

d.  ln(l/y)  -  -2.3026 

From  3b,  Appendix  A,  90Z  C.  L.: 

0.05217  £  l/\iy  -  0.16055;  taking  natural  logs: 

-2.9533  <  ln(l/Uy)  £  -1.8292 
h;  -  -1.8292  -(-2.3026)  -  .4734 
lx  -  -2.3026  -(-2.9533)  -  .6507 

o  e.  s|/2  -  .3750  (If,  Appendix  A).  c 

Using  the  x2  with  8  d/f ,  90Z  -2  sided  C.L. 

e  r-  c  C 

0.1935  i  a|/2  <  1.0977 

h2  -  1.0977  -  .3750  -  .7277 
l2  -  0.3750  -  .1935  -  .1815 

f.  I  ■  0.9163,  sz  -  0.8660,  z  is  normally  distributed  (le,  f,  App  A). 
tQ5  (8  d/f)  -  1.8595 

I13  *  I3  ■  (t«s)//n  ■  0.5368 
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A.4734)2  +  (.7277) 2  +  (.5368)2  -  1.0207 


dh  “ 

dt  •  ^(76507y2_+-(7l815)2_+-(75368y2  -  0.8628 

B  -  exp(-1.0113)  -  0.3637 
Upper  C.L.:  exp(-1.0113  +  1.0207)  -  1.0094 

Lower  C.L.s  exp(-1.0113  -  .8628)  -  0.1535 
90X  C.L.  for  B:  0.1535  <  B  <  1.0094 


ta.  Frtlmate  of  Availability: 


1 

A 

1  +  B 


- -  .7332 

1  +  .3637 


90X  -2  aided  C.L.  for  availability. 


1 

1+1.0094 


0.4977  <  A  < 


1 

1+0.1535 


.8669. 


4.  CONFIDENCE  LIMITS  FOR  AVAILABILITY  WHEN  o2  IS  KNOWN. 


a.  As  stated  In  Section  3b (1),  a  solution  may  be  obtained,  using 
tables  provided  by  reference  c,  If  o2  Is  known  and  If  values  for  m  and  n 
are  within  the  scope  of  these  tables.  A  small  section  of  this  table  Is 
Included  In  Appendix  B,  and  the  problem.  Illustrated  In  Section  3,  will 
be  reworked  by  MPE  using  •  s2  ■  0.75  and  the  results  compared  with  the 
exact  values  obtained  by  using  the  tables  In  reference  c. 

b.  Solution  by  MPE.  ", 

(1)  B  -  exp (-2. 3026  +  0.3750  +  0.9163)  -  exp<-1.0113)  -  0.3637  as 
In  Section  3g. 

(2)  hj  ■  0.4734;  ■  0.6507  as  In  Section  3d. 

(3)  1»2  "  *2  *  0,  since  o§  does  not  vary, 

(4)  h3  -  l3  -  (fff)/)'?  -  (1. 645*0. 8660)/3  -  0.4749 
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(Note  that  the  normal  t  Is  used  here  instead  of  the  student  t 
as  In  Section  3f ) . 


(5)  dh  =»  *'(.4734)2  +  <.4749) 2  “  .6704 

d^  -  6597)2  +  (.4749)2  -  .8055 

(6)  90S  -2  sided  C.L.  for  B: 

Upper  C.L. :  exp(-1.0113  +  .6704)  **  .7111 
Lower  C.L.:  exp (-1.0113  -  .8055)  ■  .1626 


Solution  using  tables,  reference  c. 


exp (o f/2) *b.x_ 
(1)  C.L.  for  B  -  - - ? - a. 


2my 


where  the  appropriate  values  for  b  are  obtained  from  the  tables  of 
reference  c.  (Note  Appendix  B  for  an  extract  from  this  table). 

(2)  90%  -2  sided  C.L.  for  B: 

From  Appendix  B,  b#05  ■  7.995;  bt95  ■  34.852 

Upper  C.L.  for  B  -  (l.*55) (34. 853;j2.302.  .  0.7043 

2.9.10 

Lower  C.L.  for  B  -  995^ .  -  0.1616 

2.9.10  ‘  , 

d.  90%  -2  sided  C.L.  for  availability: 

By  MPE:  .5844  -A  -  .8602 

By  Ref  c:  .5868  £  A  £  .8610 

5.  CONFIDENCE  LIMITS  FOR  MEAN  TIME  TO  REPAIR  GfTTR). 


a.  Frequently,  the  assumption  is  made  that  time  tc  repair  is 
distributed  as  the  exponential.  If  this  be  the  case,  confidence  limits 


for  MTTR  can  be  obtained  in  the  same  way  as  for  MTRF  (Note  3b  of 
Appendix  A).  This  is  discussed  in  detail  in  reference  i. 

b.  The  usual  assumption  is  that  time  to  rep.-ur  \s  distributed  as 
the  log  normal.  Using  this  assumption,  the  four  reports  by  Charles  E. 
uand,  references  e-h,  provide  the  necessary  procedures  and  tables  for 
obtaining  the  confidence  limits.  The  tables  listed  under  reference  h  are 
necessary  for  computing  the  confidence  limits  and  are  extremely  compre¬ 
hensive.  They  are  presently  unpublished }  and  therefore  arc  generally 
unavailable.  A  brief  extract  is  included  in  Appendix  B  . 

c.  If  time  to  repair  (x)  is  distributed  as  the  log  normal,  and 

z  *  lnx,  then  an  estimate  of  MTTR  *  exp(2  +  s£/2)*exp(0. 9163+0. 3750)  = 
exp (1.2913)  -  3.6375. 

d.  Referring  to  the  extract  from  Charles  Land's  tables  in  Appendix  2, 

for  sz  “  0.866,  using  linear  interpolation,  -0.5725  is  obtained  lor  .05 
and  1.0431  for  .95.  These  values  are  multiplied  by  s, ,  giving  -0.4958  and 
0.9033.  2 


e.  Lower  C.L.  -  exp(1.2913  -  0.4958)  -  2.2155 

Upper  C.L.  -  exp(1.2913  *  0.9033)  -  8.9770 

f.  The  solution  by  MPE: 

From  Section  3e,  h2  -  .7277  and  l2  •  .1815 
From  Section  3f,  h3  *>  «  0.5368 

-  /(.1815)2  ♦  (.53681 2  -  0.5667 

dfc  “  ^(.7277) 2  ♦  ( .5368) 2  -  0.9043  '  '  /  ■  "  '  5  ■ 

Lower  C.L.  -  exp(1.2913  -  0.5667)  •  2.0637 

Upper  C.L.  -  exp(1.2913  +  0.9043)  ■  8.9850 

g.  Comparing  these  results  indicates  a  conservative  error  for  MPE  in 
each  case,  and  a  negligible  error  for  the  more  important  case  of  upper 
confidence  limits. 

'  c  .  c 

6.  CONFIDENCE  LIMITS  FOR  AVAILABILITY  IF  \iy  IS  KNOWN.  .  1  " ‘ 

a.  Assume  that  MTBF  *  10  hrs,  as  before,  but  this  value  was  obtained 
from  long  history  instead  of  a  small  sample. 

Then  vy  -  10  and  B  -  (0.10) (MTTR) . 

^r.  Land  is  presently  negotiating  with  certain  statistical  journals,  and  it 
is  expected  that  these  tables  will  be  published  before  the  end  of  1973,  probably 
considerably  reduced  in  size. 


-373- 


• .  /N 


b.  B  -  (0.10) (3.6375)  -  0.36375,  as  in  Para  3f. 

c.  Multiplying  confidence  limits  for  MTTR  by  0.10,  gives 
901  -2  sided  C.L.  for  B: 

! 

;  by  Land's  Tables,  0.2216  <.  B  <.  0.8977 

and  by  MPE,  0.2064  £B<  0.8985 

d.  90*  confidence  limits  for  availability  by  Land's  Tables: 

0.5270  £A  <  0.8186 
by  MPE,  0.S267  <  A  <  0.8289. 

e.  The  error  resulting  from  using  MPE  is  again  negligible  in 
the  important  (lower  C.L.)  case. 

7.  COMBINING  LAND'S  TABLES  AND  MPE  TO  OBTAIN  CONFIDENCE  LIMITS  FOR 

Availability. 

!  a.  If  the  tables  by  Charles  Land,  reference  h,  are  available,  it 
appears  that  the  error  in  obtaining  confidence  limits  for  availability 
Would  be  reduced  to  a  minimum  if  Land's  Tables  wete  combined  with  MPE 
to  obtain  the  confidence  limits. 

b.  The  problem  of  Section  3  will  be  reworked  by  combining  these 
two  methods. 

c.  Fran  Section  3d,  h.  ■  0.4734  and  lx  •  0.6507.  . 

]  d.  The  application  of  Land's  Tables  m  the  solution  of  this 
problem,  can  be  obtained  from  Section  5d. 


hi  j  ■  0.9033  and 


'(.4734)2  ♦  (.9033) ; 


0.4958 

1.0197 


|,(.6S07) 2  ♦  (,4958)2  «  0.8180 


f.  From  Para  3f,  B  *  exp(-1.0113)  »  0.3637 
Then,  901  -2  sided  C.L.  forB: 
exp (-1.0113-. 8180)  <  B  <  exp(-1.0113  >  1.0197) 

>  j 

0.1605  <  B  <  1.0084. 


i 


g.  90i,  2  sided  C.L.  for  availability: 

1  -ft  AOTft  ^  A  *  1 


8617 


I+OOM  "  0,4979  ~  A  1  F05U?  *  0 

8.  COMPARISON  OF  CONFIDENCE  LIMITS  BY  VARIOUS  METHODS. 

a.  The  following  table  summarizes  the  901  -  2  sided  confidence 
limits  for  availability,  obtained  in  the  previous  paragraphs . 


CONDITIONS 

.METHOD 

.SEC. 

9C1  -  2  sided  C.L.  for  B 

WRTsTdcd - 

C.L.  for  Availability 

"  LONER 

UPPER 

"LOWER 

UPPER 

None 

MPE 

3g,h 

0.1535 

1.0094 

0.4977 

0.8669 

It 

MPE5LAND 

7f,g 

0.160S 

1.0084 

0.4979 

0.8617 

MPE 

4b  ,d 

0.1626 

0.7111 

0.5844 

0.8602 

*1 

Ref  c 

4c, d 

0.1616 

0.7043 

0.5868 

0.8610 

jy  Knpun 

MPE 

6c, d 

0.2064 

0.8985 

0.5267 

0.8289 

It 

Ref  h 

6c, d 

0.2216 

0.8977 

0.5270 

0.8186 

RSS,  lower:  ✓(.0024)*  ♦  (.0003)’  -  .0024 
KSS,  upper:  ✓(.0008)*  +  (.0103)"  -  .0103, 


-375- 


Table  1.  Comparison  of  901  -  2  sided  confidence  limits,  obtained  by 
various  methods. 

b.  A  study  of  the  above  table  suggests  the  following: 

(1)  The  use  of  MPE  usually  provides  conservative  approximations. 
That  is  to  say,  MPE  approximations  appear  to  be  a  little  larger  for 
upper  C.L.  and  a  little  smaller  for  lower  C.L.  than  the  true  value. 

(2)  If  the  root  sum  square  of  the  differences  between  the  MPE 
approximations  and  the  actual  confidence  limits  is  obtained,  when 
c£  and  are  known,  the  following  is  determined: 


A 


which  suggests  that  the  errors  in  the  MPE  approximations  for  the 
problem  of  Section  3  may  be  less  than  .01  for  the  upper  C.L.  and  less 
than  .0024  for  the  lower  C.L. 

9.  ERROR  IN  THE  USE  OF  MPE. 


a.  It  is  clear  that  the  usefulness  of  MPE  depends  upon  its 
accuracy  as  well  as  its  ease  of  npp] ication.  In  Section  8,  certain 
errors  related  to  the  specific  example  of  tins  report  were  stated.  In 
this  Section,  MPE  values  will  be  compared  with  exact  values  for  a  wide 
spectrum  of  parameters,  using  data  from  references  c,  f,  and  h  to 
provide  the  exact  comparison. 

b.  First,  a  comparison  will  be  made  for  confidence  limits  for 
availability  for  the  special  case  in  which  a*  is  known.  For  this  study, 
availability  will  be  approximated  by  MPE,  using  the  methods  of  Section  4. 
These  approximate  values  will  then  be  compared  with  exact  values  obtained 
by  using  the  tables  of  reference  c.  Table  2  shows  the  error  in  using  MPE 
for  select  values  of  m  -  n,  n/o^,  and  levels  of  probability  and  for  the 
special  case  of  7  -  «Tg. 


PROBABILITY 

m  *  n 

wmm 

.95 

.90 

.75 

.25 

.10 

.05 

5 

wm 

s 

12 

-.002 

40 

.002 

-.002 

9 

5 

.004 

.005 

.007 

.005 

.003 

.001 

12 

.002 

.003 

.004 

.002 

.002 

.001 

40 

.001 

.001 

.001 

.001 

.001 

.000 

13 

S 

.003 

.004 

.005 

.005 

.003 

.002 

12 

.002 

.003 

.003 

.002 

.001 

.001 

40 

.001 

.001 

.001 

.001 

.000 

.000 

Table  2,  [A  (exact)  -  A  (MPE)],  a|  known  and  J  -  4^. 


-376- 


A 


l 


(1)  From  Table  2,  it  appears  that  under  the  conditions  of  this 
paragraph,  and  if  m  and  n  both  exceed  5,  the  error  should  not  exceed  .008 
if  availability  is  approximated  by  MPE.  As  n Jo\  increases,  the  error 
rapidly  approaches  zero.  As  m  or  n  increases,  the  error  approaches  zero, 
but  rather  s_owly.  Errors  appear  to  be  smaller  when  probabilities  are 
close  to  0  or  1;  and  for  lower  confidence  limits,  the  error  is  conservative. 

(2)  To  investigate  the  error  if  y  ^  4Xg,  the  worst  case  situation  in 

Table  2  was  selected;  i.e.,  m  =  n  ■  n/a|  =  5  and  prob.  *  0.75.  Here  it  was 
found  that’  for  some  examples  of  y  <  4xg,  the  error  became  as  large  as  .009. 

As  the  ratio  of  y  to  45L  increases,  the  error  slowly  decreases.  For  example, 
when  the  ratio  reaches  zO,  the  error  is  .03. 

(3)  From  the  comparisons  of  this  Section,  it  appears  that  the  error 
resulting  from  the  use  of  MPE  when  determining  confidence  limits  for 
availability  is  less  than  .01,  in  the  special  case  in  which  o2  is  known.  MPE 
has  the  following  additional  advantages. 

(a)  Can  be  used  when  the  tables  of  reference  c  are  unavailable. 

(b)  Can  be  used  when  parameters  that  are  not  included  in 
reference  c.  For  example,  these  tables  provide  entrees  only  for 

5  <  m  <  13;  have  no  values  for  2  sided  -951  confidence  limits;  and  provide 
a  limited  selection  of  values  for  n/o|. 

(c)  MPE  may  be  applied  when  a*  can  only  be  approximated  from  a  sample. 

c.  The  second  study  of  this  Section  is  to  provide  a  comparison  of  the 
data  contained  in  the  tables  of  reference  h,  and  the  corresponding  values 
if  obtained  by  MPE.  Furthermore,  values  obtained  by  the  method  of  minimum 
variance  unbiased  estimators  (MVUE)  will  be  compared  with  corresponding  exact 
and  MPE  values.  The  technique  of  MVUE  was  developed  by  Dr.  Land  at  the 
suggestion  of  Prof.  D.  R.  Cox.  Dr.  Land  has  compared  his  exact  values  with 
several  well  known  approximations,  and  generally  MVUE  provided  better 
approximations  than  any  of  the  other  procedures.  Furthermore,  MVUE  is  1 

essentially  equivalent  to  the  procedure  of  conventional  propagation  of  errors. 

(1)  Tables  3  and  4  are  adaptations  of  two  tables  prepared  by  Dr.  Charles 
Land  and  included  in  reference  f.  In  this  reference.  Land  compares  exact 
values  of  confidence  limits  for  ln^  *  yz  ♦  H  cr^  with  the  MVUE  approximation 


1MVUE  uses  s£  -  s \/n  *  s£/2(n+l),  conventional  propagation  of  erro. _  uses 
s£  *  s|/n  ♦  (n~l)s2/2n2,  where  w  -  2  ♦  (»j)s£. 
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as  well  as  three  .other  approximations.  Tables  3  and  4  copy  Land's  data 
for  exact  and^VUE  values  and  includes  the  values  obtained  by  MPE.  Table  3 
is  for  T  -  1.219  and  s*  ■  0.208.  Table  4  is  foi  z  *  7.650  and  s|  *  4.632. 


imr 

1  n  ■  11 

1  n  =>  101 

!  n  =  1001  ; 

EXACT 

MPE 

EXALT 

MPE 

laXTxiB 

MFe 

MviJi;; 

.005 

.950 

.883 

.952 

1.205 

1.200 

1.200 

1.285 

1.284 

1.284 

.010 

.990 

.938 

.988 

1.216 

1.212 

1.212 

1.288 

1.288 

1.288 

.025 

1.045 

1.012 

1.041 

1.232 

1.230 

1.230 

1.294 

1.293 

1.293 

.050 

1.091 

1.069 

1.086 

1.247 

1.245 

1.245 

1.298 

1.298 

1.298 

.100 

1.142 

1.130 

1.139 

1.263 

1.262 

1.262 

1.304 

1.304 

1.304 

.250 

1.227 

1.225 

1.226 

1.291 

1.291 

1.291 

1.313 

1.313 

1.313 

.500 

1.324 

1.323 

1.323 

1.323 

1.323 

1.323 

.750 

1.431 

1.432 

1.420 

1.356 

1.356 

1.355 

1.333 

1.333 

1.333 

.900 

1.546 

1.541 

1.507 

1.387 

1.386 

1.384 

1.343 

1.343 

1.342 

.950 

1.629 

1.619 

1.560 

1.405 

1.404 

1.401 

1.348 

1.348 

1.348 

.975 

1.712 

1.698 

1.605 

1.422 

1.420 

1.416 

1.353 

1.353 

1.353 

.990 

1.829 

1.809 

1.658 

1.442 

1.439 

1.434 

1.359 

1.359 

1.358 

.995 

1.925 

1.900 

1.727 

1.456 

1.453 

1.446 

1.363 

1.362 

1.362 

Table  3.  One  sided  confidence  limits  for  lny*  ■  uz  comparing  MPE  with 

exact  values  from  Land's  Tables  and  MVUE. 

z  ■  1.219  and  s|  ■  .208. 
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n  =  ;  0 1 

1 

n  =  1001 

MPE 

mmm 

EXACT 

MPE 

MV;.. 

■  EXACT 

Min-: 

MVIJH 

— 

7.774 

7.480 

7.012 

9.119 

9.096 

8.965 

,  9.666 

9.664 

9.647 

.010 

8.-62 

7.576 

7.298 

9.191 

9.173 

9.062 

9.693 

9.692 

9.678 

.025 

8.297 

8.096 

7.718 

9.300 

9.288 

9.204 

9.735 

9.733 

9.723 

.050 

8.514 

8.398 

8.080 

9.397 

9.390 

9.327 

9.771 

9.770 

9.763 

8.787 

8.723 

8.496 

9.515 

9.511 

9.468 

9.813 

9.812 

9.808 

9.319 

9.312 

8.193 

9.724 

9.724 

9.704 

9.885 

9.885 

9.883 

.500 

10.083 

9.966 

9.977 

9.966 

9.967 

9.966 

.750 

11.151 

11.176 

10.740 

10.257 

10.258 

10.228 

10.052 

10.052 

10.050 

.900 

12.555 

12.568 

11.436 

10.  .‘35 

10.534 

10.464 

10.131 

10.131 

10.125 

.950 

13.712 

13.717 

11.853 

10.716 

10.712 

10.606 

10.179 

10.179 

10.170 

.975 

14.999 

14.995 

12.214 

10.882 

10.876 

10.728 

10.222 

10.220 

10.209  I 

.990 

16.950 

16.938 

12.634 

11.086 

11.077 

10.870 

10.272 

10.271 

10.254  . 

.995 

18.658 

18.634 

12.920 

11.234 

11.223 

10.967 

10.307 

10.305 

10.285  | 

Table  4.  One  sided  confidence  limits  for  lny  ■  y  +()s )aj,  comparing  the  MPE 
approximation  with  exact  values  from  Land’s  tables2 and  MVuE. 

z  ■  7.650  and  s*  *  4.632. 


(a)  Table  3 provides  comparative  data  for  an  example  for  which  s£  =  0.208 
is  relatively  small,  and  the  following  is  evident: 

1,  For  n  -  100,  either  MPE  or  MVUE  should  provide  a  satisfactory 
approximation. 
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2.  For  n  -  11  and  the  probability  level  less  than  0.2S,  MVUE  is 
somewhat  better  than  MPE. 

-  3.  For  n  ■  11  and  the  level  greater  than  0.75,  MPE  provides  a  much 
better  fit  than  MVUE. 

£.  As  will  be  shown  soon,  MVUE  is  generally  under  0.25  and  frequently 
very  poor  at  levels  above  0.75.  .‘!PE  is  usually  at  its  best  at  levels 
above  0.75.  For  most  applications,  the  higher  levels  provide  the  more 
useful  levels  of  confidence. 

(b)  Table  4  uses  much  larger  values  for  both  z  and  s|  than  table  3. 

For  this  example,  MPE  provides  better  results  than  MVUE  in  almost  all 
instances.  Specifically,  for  n  -  11  and  for  levels  >  .750,  MPE  provides 
excellent  approximations  while  the  use  of  MVUE  results  in  large  errors. 

(2)  Table  5  contains  a  sample  of  data  for  n  ■  3  taken  from 
reference  h  (Tables  by  Charles  Land) ,  and  compares  these  data  with 
corresponding  values  obtained  by  the  methods  of  MVUE  and  MPE.  Table  6 
provides  these  comparisons  in  graphical  form  for  levels  0.90  and  0.005. 

From  these  tables,  it  appears  that  even  for  the  very  small  sample  of  3,  MPE 
provides  a  close  approximation  if  the  level  >  0.25.  MVUE  provides 
acceptable  approximations  in  some  areas,  frequently  better  than  MPE. 

However,  MVUE  generally  does  not  provide  satisfactory  approximations  for 
any  value  of  s  for  levels  >  50%. 

(3)  Table  7  provides  a  graphical  comparison  at  4  levels  for  exact,  MPE, 
and  MVUE  values  for  n  *  11.  These  graphs  suggest  that  for  a  sample  of  this 
size,  MPE  is  clearly  superior  to  MVUE  for  almost  all  levels,  and  MPE  should 
be  a  reasonably  satisfactory  approximation  for  any  level  of  probability 
greater  >.  0.05. 


^Levels  under  0.50  correspond  to  lower  confidence  limits  for  maintainability 
and  upper  confidence  limits  for  availability  (note  Sections  5  and  6) . 

Levels  above  0.50  correspond  to  the  reverse. 

■%)ata  in  reference  h  are  multiplied  by  sz  and  added  to  1  +(>j)s|  to  obtain 

confidence  limits  for  p2  +(Jj)o|,  assuming  z  is  normal.  Appendix  B  contains 
an  extract  from  these  tables  for  n  ■  9. 
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!  METHOD 
! 

.005 

.010 

.025 

.050 

.100 

.250 

.750 

.900 

.950 

.975 

.990 

.995 

EXACT 

4.104 

3.136 

2.113 

1.506 

1.01? 

.457 

!  .491 

1.192 

1.945 

3.038 

5.889 

10.364  | 

00 

MPE 

5.730 

4.021 

2.484 

1.686 

1.089 

.471 

1  .487 

1.108 

1.921 

3.134 

7.589 

11 .4X2  | 

Mvun 

1.400 

1.346 

1.134 

.951 

.741 

.390 

.390 

.741 

.951 

1.134 

1.346 

1.490 

1 

EXACT 

2.271 

1.919 

1.478 

1.162 

.858 

.438 

.655 

2.114 

4.593 

9.603 

24.618 

49.623 

0( 

:;>[■ 

5.733 

4.026 

2.491 

1.694 

1.098 

.477 

.779 

2.383 

4.903 

9.872 

25.410 

50.079 

MVUE 

i 

1.555 

1.408 

1.183 

.993 

.774 

.407 

.407 

.774 

.993 

1.183 

1.405 

1.555 

■ 

EXACT 

1.753 

1.546 

1.269 

1.053 

.825 

.453 

1.150 

4.211 

9.231 

19.240 

49.297 

99.2118 

on1 

'MPE 

5.744 

4.040 

2.511 

1.718 

1.125 

.492 

1.326 

4.377 

9.362 

19.269 

49.833 

99.649 

MVUE 

1.744 

1.575 

1.327 

1.114 

.868 

.457 

.457 

.868 

1.114 

1.327 

1.575 

1.744 

EXACT 

1.622 

1.489 

1.296 

1.130 

.926 

.503 

2.429 

8.474 

18.488 

38.494 

98.495 

i 

00  MPE 

5.787 

4.096 

2.S89 

1.813 

1.227 

.548 

2.523 

8.548 

18.494 

38.296 

99.167 

MVUE 

2.351 

2.124 

1.789 

1.501 

1.170 

.616 

.616 

1.170 

1.502 

1.789 

.2.124 

[EXACT 

2.381 

2.264 

2.086 

1.865 

1.568 

.789 

6.172 

21.221 

00 

MPE 

6.078 

4.472 

3.081 

2.370 

1.785 

.841 

6.214 

21.225 

46.075 

95.572 

1 

MVUE 

4.790 

4.326 

3.64S 

3.059 

2.383 

1.254 

1.254 

2.333 

3.059 

3.645 

> 

* 

;  EXACT 

3. 2"7 

3,140 

2.898 

2.632 

2.224 

1.107 

9.273 

32.204 

MPE 

6.488 

4.978 

3.694 

3.014 

2.384 

1.147 

9.305 

31.814 

69.086 

143.33 

50  j 

MVUE 

6.990 

6.313 

5.319 

4.464 

3.478 

1.830 

1.830 

3.478 

4.464 

5.319 

> 

f 

EXACT 

4.233 

4.068 

3.768 

3.431 

2.905 

1.440 

12.371 

00: 

MPE 

7.021 

5.612 

4.411 

3.733 

3.031 

1.471 

12.400 

42.407 

92.103 

191.095 

9.227 

8.334 

7.021 

5.893 

4.591 

2.416 

2.416 

4.591 

5.893 

7.021 

i 

NEGATIVE 

POSITIVE 

)le  5.  Comparison  of  MPE  and  MVUE  with  exact  values  extracted  from  reference  h  (n  =  3) . 
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(4)  Using  the  criteria  that  the  relative  error  for  MPE  shall  not 
exceed  0.1,  the  following  table  indicates  generally  safe  areas  for  the 
application  of  MPE.  j 


SAMPLE  SIZE 
3 


LEVELS  OF  PRQB.  GREATER  THAN  OR  EQUAL 


.250 


6 

11 


.100 

.050 


25 


.010 


50 


.005 


Above  50 


No  restriction 


d.  From  the  discussion  of  this  Section,  the  following  conclusions  are 
drawn  about  the  errors  resulting  from  the  use  of  MPE. 

(1)  In  the  event  that  a*  is  known  and  MPE  is  used  to  approximate  the 
methods  of  reference  c  (note  Section  4  and  9b  of  this  report),  the  error  can 
be  expected  to  be  less  than  .01;  for  lower  confidence  limits,  the  MPE  error 
is  conservative,  thus  the  true  lower  confidence  limit  can  be  expected  to  be 
slightly  larger  than  the  approximation.  For  upper  confidence  limits,  the 
error  is  not' conservative  but  is  usually  very  small. 

(2)  If  u  is  known  and  MPE  is  used  to  approximate  confidence  limits  for 
availability/  as  discussed  in  Section  6,  both  upper  and  lower  approximations 
will  almost  always  be  conservative,  and  in  the  very  few  instances  in  which 
not  conservative,  the  error  will  be  small.  If  the  regions  in  which  the 
relative  error  may  exceed  0.1  are  avoided  [note  para  9c (4)],  the  method  of 
MPE  should  provide  errors  no  larger  than  .02  in  the  approximate  confidence 
limits  for  availability. 

(3)  If  the  RSS  of  the  maximum  errors,  discussed  in  the  two  previous 
paragraphs,  is  obtained,  it  would  suggest  that  the  method  of  MPE,  when  applied 
to  the  total  problem  of  approximating  confidence  limits  for  availability  (as 
described  in  Section  3) ,  should  provide  an  error  which  is  conservative  and 
will  not  exceed  0.025. 
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10.  CONFIDENCE  LIMITS  TOR  SYSTEM  RELIABILITY  FROM  COMPONENT  RELIABILITY  DATA. 

a.  MPE  will  be  used  to  obtain  approximate  confidence  limits  for  a  system 
which  consists  of  three  components  in  series  and  rel  .inlity  estimates  for 
each  component  has  been  obtained  in  three  separate  tists.  It  is  not  claimed 
that  MPE  is  the  best  method  for  obtaining  these  approximate  confidence  limits, 
but  it  is  offered  to  illustrate  this  method  when  the  function  is  a  product  of 
independent  variables. 

b.  Assume  the  results  of  the  three  component  tests  are  these  provided 
by  the  following  table. 


d.  One  question  arises,  when  dealing  with  a  nonlinear  function,  what 
should  be  substituted  for  the  r^,  since  they  are  unknown?  The  estimate  may 
be  used  or  the  confidence  limit  itself  might  be  substituted.  Some  studies 
have  indicated  that  substitution  of  confidence  limits  usually  gives  better 
results.  Both  procedures  will  be  illustrated. 
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e.  Using  the  estimate  of  r^. 

^  *  (*95* .90* .08)*  ♦  (.84*. 90". 041)*  + 

(.84*. 95*. 073)*  -  .00904;  -  .095 

d|  -  (.9S*.90\n)2  ♦  (.84*.90‘.041)*  ♦  c 

c  (.84*. 95*. 139)*  -  .02675;  dt  -  .164 

90t  -  2  sided  C.L.  for  the  system: 

.718  -  .164  -  .554  <  R  <  .718  +  .095  -  .813 

f.  Using  confidence  limits  for  r^. 

d£  -  (.991* ,973* .080) *  ♦  (.920* .973* .041)*  + 

(.920* .991* .073) *  -  .01172;  -  .1083 

d*  -  (.851*. 761*. II)2  ♦  (.730-.851-.139)*  ♦ 

(-730-.851-.139)*  -  .01556;  d*  -  .1247 

901  -  2  sided  C.L.  for  the  system: 

.718  -  .125  -  .593  <  R  <  .718  +  .108  -  .826 

g.  An  alternate  approach,  eliminating  the  dilemma  of  a  proper  value  to 
be  substituted  for  t1  ,  r2 ,  and  r3 ,  is  to  use  logarithms  and  thus  transform 
the  given  function  into  a  linear  function. 

log  R  -  log  rj  +  log  r2  ♦  log  r}. 

est.  of  log  R  -  -.0757  -  .0223  -  .0458  -  -.1438. 
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log  of  lower  C.L. 

log  .730  -  -.1367 
log  .851  -  -.0701 
log  .761  -  -.1186 


log  of  upper  C.L. 

log  .920  -  -.0362 
log  .991  »  -.0039 
log  .973  ■  -.01i9 


t  -  -. 0757-*’.  1367«. 0610 ;hs-- .0362+. 0757*. 039 5 

j 

*2  -  -.0223+ .0701-.0478;h?  — .0039+. 0223-.0184 
li  -  -. 0458+. 1186". 0728 ;hj“". 0119+ .0458“. 0339 

d£  -  (.0610) 2+(.0478) 2+ (.0728) 2*. 01130  dA«.1063 

d£  -  (.0395) 2+(. 0184) 2+(.0339) 2“. 00305  <^".0552 

lower  C.L.  -  antilog  (-.1438-. 1063)  ■  .562 
upper  C.L.  -  anitlog  (-.1438+  .0552)  ■  *815 
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APPENDIX  A 


SYMBOLS  AND  DEFINITIONS  AND  DATA  FOR  THE  EXAMPLE 

1.  General  Definitions  and  Data. 

a.  ra  -  no.  o£  failures  -  9  «  •<■<«•  •  •  ♦  ‘ 

b.  n  ■  no.  of  repairs  *  9 

c.  t  ■  total  operating  time  -  90  hours 

d.  x  ■  repair  time 

e.  z  ■  In  repair  time--assume  that  z  is  normally  distributed 

f .  z  ■  0.9163;  sJ  •  0.75;  s  -  0.8660 

Z  • 

g*  5^  ■  exp  (z)  -  geometric  mean  of  the  sample  of  x's  -  2.50  hours 

h.  exp(s*/2)  *  exp(0.37S)  =  1.455 

it 

i.  estimate  of  mean  time  to  repair  (MTTR)  -  expCz+s*/2)  “  3.6375  =  x 

j.  y  is  time  to  failure.  Assume  that  y  is  distributed  as  the  exponential 

k.  y  ■  t/m  -  sample  mean  time  between  failure  (MTBF)  »  10  hours 

l.  -  population  mean  time  to  repair  »  exp(y2  ♦  cr*/2) 

m.  yy  -  population  mean  time  between  failure 

n.  C.L. --confidence  limits 

2,  Availability  Formulas. 

a.  A  -  vu/CPy  +  y*) 

b.  Est.  of  A  ■  A  «  y/(y  +  x)  .. -  ,  .  .  t 

c.  A  *>  1/Cl  +  B),  where  B  ■  Vy/Vy 
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d.  Est.  of  B  «  3  ■  xfy  ■  (1/7)  exp(s|/2  +  ?) 

e.  B  -  (1/u. )  exp (a*/ 2  ♦  uz)  -  exp[ln(l/yy)  -  a\/ 2  ♦  uj 

f.  B  -  exp[ln(l/7)  ♦  sf/2  ♦  T] 

3.  Reliability  (MTBF)  formulas.  <  ■ 

a.  Consider:  1/y  -  m/t  -  9/90  -  0.10;  ln(l/y)  •  >2.3026 

b.  Confidence  limits  for  l/yy  (m  is  fixed,  t  is  variable) 

x2m;a/2  <‘l/u..  <  X?m:  (1  -  a/?) 

^  x - 2T 

for  90%  -  2  sided  C.L.  9.39/180  <  1/Uy  <  28.9/180 

0.05217  <  1/uy  i  0.160S5 

c.  If  t  is  fixed  and  m  is  variable,  then  confidence  limits  for  MTBF 
become: 


<  % 


c 


APPENDIX  B 

c  1.  Extract  from  table  listed  in  reference  3.  (n/c|  ■  9  {  0.7S  -  12): 


g/m 

8 

9 

10 

.050 

6.865 

7.995 

9.171 

.100 

8.227 

9.541 

10.855 

.250 

11.072 

12.674 

14.279 

.500 

15.211 

17.201 

19.191 

.750 

20.653 

23.116 

25.577 

.900 

26.960 

29.941 

32.910 

.950 

31.516 

34.852 

38.173 

2.  Extract  from  the  tables  of  reference  8  (Charles  Land).  The  extract  is 
for  n-1  -  8  d/f . 


s 

.025 

.05 

.10 

.90 

.95 

.975 

.50 

-.6733 

-.5657 

-.4404 

.5763 

.8050 

1.0475 

.60 

-.6736 

-.5646 

-.4419 

.6115 

.8620 

1.1329 

.70 

-.6721 

-.5658 

-.4451 

.6508 

.9254 

1.2264 

.80 

-.6733 

-.5691 

-.4497 

.6939 

.9945 

1.3272 

.90 

-.6769 

-.5743 

-.4556 

.7402 

1.0682 

1.4340 

1.00 

-.6825 

-.5811 

-.4627 

.7896 

1.1452 

1.5455 
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